|
performance issues |
Posted by MBB on Jun-26-2009 07:14 |
|
Hello,
I downloaded the VB/COM version of your very nice software and have quickly generated a lineplot with lots of traces and a scatter plot with two traces plotted against each other.
My company is in search of a 3rd party component like ChartDirector. We handle rather large sets of scientific data. Plotting matrices that have 500 rows and 5000 columns is not uncommon; sometimes there are more than 20000 columns. We often want to make a single plot with all the rows overlaid. So I am interested in performance when drawing lines.
Do you know how the relative speed of plotting is affected by the ChartDirector implementation mode? That is, can I expect to draw faster if I am using the C++ version or the .NET version? I know that I can download the other these other two variants and run the test myself but I imagine you already have experience in this area. ;=)
TIA,
MBB |
Re: performance issues |
Posted by Peter Kwan on Jun-26-2009 13:23 |
|
Hi MBB,
Compared to other charting software, we feel that ChartDirector is one of the fastest software in the market.
Anyway, we have only tested ChartDirector up to 2000000 data points for a line chart, on a chart which is 500000 x 80 pixels in size, and it works normally. We realize it is not meaningful to draw a chart with much more data points than the number of pixels on the screen, and we think in practice no-one should need an image that large in size, so we have not tested further.
I see in your case, you have 500 rows x 20000 columns = 10000000 points, which exceeds what we have tested, but I think it should work fine.
The following are some tips if you need to plot millions of points:
(a) If you do not have that much pixels on the screen, you may consider to aggregate the data points before plotting them.
If you plot all points directly on the screen, you lose information. It is because there are insufficient pixels on the screen, so the data representation overlaps and overwrites each other.
If done correct, aggregating the data first would not lose anymore information that directly plotting, and may even preserve information. It is because you are aggregating in a controlled way, and you can obtain aggregation statistics. So aggregating the data may produce a more informative and detail chart than direct plotting.
For a line chart, suppose the line flows from left to right, and you have 800000 data points, but the plot area is just 1000 pixels wide. So around 800 data points will fall on the same horizontal pixel and horizontally overlap.
One method to aggregate the data is to aggregate 800 data points into 2 points - the maximum and minimum values. So the 800000 values are reduced to 1000 maximum values and 1000 minimum values. You then plot a line using the maximum value only, and plot another line using the minimum values only, and you fill the region in between the two lines with a lighter color. In this way, you get a chart that is essentially the same as the original chart, but looks nicer. It is also much faster for your database. (Aggregation is done by the database, typically using an SQL GROUP BY statement). Instead of returning 800000 data points, the database only needs to return 2000 points.
Instead of using max and min values, you can aggregate into 3 values (max/min/avg). Your chart now shows the max/min/avg values, and it is more informative than just directly plotting the data points.
Of course, the exact method of aggregation depends on the nature of the chart and the nature of the data. For a bar chart or area chart, just taking the maximum value is enough. For a 2D scatter chart without any flow direction, you may need to plot all points without aggregation.
(b) ChartDirector is very fast in plotting the chart. But apart from plotting the chart, ChartDirector can also generate "hot spots" for the chart, so that you can show tooltips for each data point when the mouse is over that point, and to click on the data points. For millions of points, creating millions of clickable regions on the screen are much more CPU and memory intensive than plotting the chart itself. So for more than a few thousand clickable regions, we suggest to disable the hot spots entirely and just plot the chart.
(c) For speed, the .NET and Java edition is slower than the C++ edition, but not much slower. The PHP/Perl/Python/Ruby/ASP/COM/VB are similar in speed to C++ (they are all wrappers to the C++ version). However, if you use PHP/Perl/Python/Ruby/ASP/COM/VB, your own code could be slow. (For example, writing a loop to read 10000000 data points from a database query could be slow in PHP, but 2000 aggregated points should be OK.)
Hope this can help.
Regards
Peter Kwan |
Re: performance issues |
Posted by MBB on Jun-27-2009 02:21 |
|
Hi, Peter,
Thanks for your rapid and very substantive reply.
I will think about them and try them out to see how they affect time to produce a plot.
MBB |
|