|
Box-Whisker Chart |
Posted by Eric on Mar-14-2014 21:41 |
|
Hi There,
I try to draw a box whisker chart with large data points per group.
E.g: Group A data points = {0.003,0.042,0.0432,0.01,0.05,0.006,0.7,...
}
Is there a way that I can simply pass those data point per group and let the ChartDirector
to determine minimum, 1st quartile, medium, 3rd quartile and maximum values
autimatically/on-the-fly?
Please advise. Thanks. |
Re: Box-Whisker Chart |
Posted by Peter Kwan on Mar-15-2014 02:51 |
|
Hi Eric,
ChartDirector includes an ArrayMath utility class that can compute the precentiles. For
example, in Java/C#, it is like:
ArrayMath m = new ArrayMath(GroupADataPointsArray);
double max = m.max();
double min = m.min();
double med = m.med();
double p25 = m.percentile(25);
double p75 = m.percentile(75);
You can then use the computed values to plot the chart.
Hope this can help.
Regards
Peter Kwan |
Re: Box-Whisker Chart |
Posted by Eric on Mar-20-2014 10:17 |
|
Peter Kwan wrote:
Hi Eric,
ChartDirector includes an ArrayMath utility class that can compute the precentiles. For
example, in Java/C#, it is like:
ArrayMath m = new ArrayMath(GroupADataPointsArray);
double max = m.max();
double min = m.min();
double med = m.med();
double p25 = m.percentile(25);
double p75 = m.percentile(75);
You can then use the computed values to plot the chart.
Hope this can help.
Regards
Peter Kwan
Thanks Peter for pointing that out. I will try on it but before I do so, it is fast enough for
large data set (e.g: 2 millions data plot)?
Thanks. |
Re: Box-Whisker Chart |
Posted by Peter Kwan on Mar-20-2014 18:11 |
|
Hi Eric,
The algorithm used by ChartDirector ArrayMath is one of the most efficient for general
data. So it should close to as fast as the computer can possibly compute the percentile.
Note that no matter the percentile is computed inside ChartDirector or outside using
external code, it still takes time for sufficiently large data sets. Using ArrayMath will not
make the code slower.
If your data come from a database, sometimes the database may have additional
information regarding the data. For example, the database may have indexed the data. In
this case, the database can probably compute it faster (and can be much faster with a
suitable SQL query), as it has the index.
Sometimes there may be certain features of the data that you know but ChartDirector
does not know. For example, the data may be sorted. In this case, you may be able to
write code that compute the percentile faster by taking advantage of this knowledge.
If your database connection is not fast (eg. the database is on a remote network), it
may be faster to use the database to compute the percentile even the data are not
indexed by the database. Although the database will probably be slower than
ChartDirector in computing percentile (as for ChartDirector, the data are in memory, while
for the database probably, the data are on disk), the database only needs to return 5
values for the max, min, med, p25, p75. If the computation is done by ChartDirector, the
database needs to return 2 million records. So the efficiency of ChartDirector can be
offseted by the database overhead for returning 2 million records.
So in brief, ChartDirector should be one of the most efficient in the "general case"
(without any assumptions). However, in real cases, there are often additional
assumptions (data are indexed or sorted or database connection is slow, etc). In this
case, a method that is tailored for those assumptions may be faster. In particular, using
the database can be fast because the database is by design optimized for aggregating
data.
Regards
Peter Kwan |
Re: Box-Whisker Chart |
Posted by Naveen Michaud-Agrawal on Mar-25-2014 00:32 |
|
Hi Peter,
Does the above approach require a separate scan of the data for each percentile returned?
Would it possible to add an api to ArrayMath to take a series of percentile values to return?
Ie:
ArrayMath m = new ArrayMath(GroupADataPointsArray);
DoubleArray percentiles = m.percentiles({0., 25., 50., 75., 100.});
Regards,
Naveen |
Re: Box-Whisker Chart |
Posted by Peter Kwan on Mar-25-2014 02:36 |
|
Hi Naveen Michaud-Agrawal,
I have checked and found that whereas ChartDirector is fast to compute one percentile,
it is not optimized to compute many percentiles on an unchanging array.
To compute percentile, the obvious method is to sort the array, so then get the item in
the required position (eg. in the middle position for the median). ChartDirector optimizes it
further. It does not even fully sort the array. It only performs a "partial sort" to the
extent that is necessary to obtain the required percentile. This "partial sort" may not be
usable for for another percentile value, which means if any percentile value is needed, it
needs to perform another partial sort.
For your case, may be you can consider to sort the array with your own code, and get
the percentile. It only takes one line of code to sort the array, and it is easy to obtain
the percentile on a sorted array (may be 3 lines of code):
std::sort(myData, myData + dataLength);
double p25 = getPercentile(myData, dataLength, 25);
double p50 = getPercentile(myData, dataLength, 50);
double p75 = getPercentile(myData, dataLength, 75);
.....
where getPercentile is:
// data != null, len > 0, 0 <= percentile <= 100
double getPercentile(double *data, int len, double percentile)
{
percentile *= (len - 1) / 100.0;
int i = (int)floor(percentile);
return data[i] + (percentile - i) * (data[std::min(len - 1, i + 1)] - data[i]);
}
Hope this can help.
Regards
Peter Kwan |
|