|
A possible addition to scatter points: multivariate kernel density estimation |
Posted by Daniel on Sep-24-2012 17:13 |
|
I recently stumbled on a series of work on "multivariate kernel density estimation" and, specifically, bivariate carnation typing "kde2d" on google images.
Give food for thought. Among others, this blog delivers some brilliant samples:
http://gettinggeneticsdone.blogspot.fr/2012/07/fix-overplotting-with-colored-contour.html
Brilliant graphics that is not only nice but convey extra information as well especially in case where least-square-regression-fitting ain't a solution.
Maths is certainly not my strong point any more, question of skills (and age possibly as well). But it looks this is now documented stuff and possibly algorithmic as well:
http://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation
Could that functionality be added possibly to the already fine scatter point regression-fitting functionality especially on the 2-D versions? IMHO that would make chartdirector even more compelling:)
Kind regards for the great work! |
Re: A possible addition to scatter points: multivariate kernel density estimation |
Posted by Daniel on Sep-24-2012 21:52 |
|
As an addendum to the previous message, I understand that there may ALREADY be some alternative way to produce some of the output mentioned above via "smoothing".
Should that be the case, I'd welcome the trick that need to be perform to add and/or transform a scatter point in some sort of shade-based alternative! |
Re: A possible addition to scatter points: multivariate kernel density estimation |
Posted by Peter Kwan on Sep-25-2012 03:54 |
|
Hi Daniel,
Thanks for the suggestion. The "multivariate kernel density estimation" is really an interesting statistics technique. We will certainly consider that in future versions of ChartDirector.
As ChartDirector is not designed to be a statistics program (it is basically designed as a general charting program), it may be lacking many complex statistics features. Currently, the plot the result of some complex statistics, the developer would need to use his own method to compute the statistics. ChartDirector can be used to plot the result.
The "R" statistics package mentions in your link also plot the chart in two stages - first compute the "2D kernel density" (using the kde2d function), then plot the result as a contour chart. ChartDirector only knows how to plot the contour chart, but not computing the "2D kernel density" (or most of the more complex statistics in "R").
I am not sure exactly how the "R" statistics package computes the "2D kernel density". But from the online "R" documentation on the kde2d function, it seems it just assumes there is a 25 x 25 grid (configurable), and it just computes the "2D kernel density" at these grid points. The kernel density at a grid point is just the sum of the contributions of all the data points, whether the contribution depends on the distance of the data points to the grid point (the further away is a data point, the less it contributes to a grid point).
If you can write some code to compute the 2D kernel density at one point, then you can use a loop to compute the 2D kernal density at 25 x 25 points, and the feed the result to ChartDirector to plot the chart.
Regards
Peter Kwan |
Re: A possible addition to scatter points: multivariate kernel density estimation |
Posted by Daniel on Sep-25-2012 15:36 |
|
May I thank you for the documented answer you always provide on these two forums:)
"Thanks for the suggestion. The "multivariate kernel density estimation" is really an interesting statistics technique. We will certainly consider that in future versions of ChartDirector."
I can remember you had the same answer on least-square regression at the time only linear was available. And you made sure embed the functionality later. So I am comfortable with your answer.
I am no expert in stats and certainly do want to indulge in techniques that you cannot comfortably explain to your normal kind of audience. This 2D technique certainly do not belong to the cryptic part of the stat theory and IMHO could be part of chartdirector as much as it does belong to more math-orientated packages!
"If you can write some code to compute the 2D kernel density at one point, then you can use a loop to compute the 2D kernal density at 25 x 25 points, and the feed the result to ChartDirector to plot the chart."
I tried and got limited success with various size for matrix size. But that was more a random hack than a solid and formal test and try to feed. I'll try to make a more robust test case!
Regards |
Samples of the test using your input |
Posted by Daniel on Sep-27-2012 17:28 |
|
Hi Peter,
Please find enclosed a sample of the Kernel density that I was able to add to the scatter point.
Set an adequate cell-size parameter and calculate density in terms of plain "point count" (was not successful with distances). Then AddContourLayer and then some setColorGradient "et voil?!". It Works great.
I'd be glad to have it integrated into the package since building the grid requires a bit of tuning. For example I discovered as well that it is important to insert points outside the "plot aera" when you use a big cell. An integrated function would be fine:)
Daniel
|
Re: Samples of the test using your input |
Posted by Peter Kwan on Sep-28-2012 01:38 |
|
Hi Daniel,
Thanks for providing us with such a beautiful example.
Yes, I think it is definitely a good idea to add it to ChartDirector, considering that it is useful, spectacular and the algorithm should not be too difficult.
Regards
Peter Kwan |
|