by merusame on 5/2/16, 8:10 AM with 40 comments
by mattnedrich on 5/2/16, 2:23 PM
Some comments on K-Means - one large limitation of K-Means is that it assumes spherical shaped clusters. It will fail terribly for any other cluster shape.
It's interesting that the author compared results on the same data set for the different algorithms. Each clustering approach is going to work best on a specific type of data set. It would be interesting to compare them across several different data sets to get a better feel for strengths/weaknesses, etc.
by jey on 5/2/16, 4:46 PM
Here's more from an actual expert: http://research.microsoft.com/en-US/people/kannan/book-chapt...
by merusame on 5/2/16, 9:34 AM
by makmanalp on 5/2/16, 3:51 PM
Is it just a matter of tweaking the definition of density / distance to the number of hops, or is it a different problem entirely? I can see how with 0 or 1 hops the data would be a very smushed distribution, versus 2d distance is much more rich and spread out.
by Xcelerate on 5/2/16, 1:24 PM
A model is a guess about the underlying process that "generates" the data. If you're trying to use hyperplanes to divide data that lies on a manifold, then you are going to have poor results no matter how good your fitting algorithm is.
On the other hand, even if you know the true model, high levels of noise can prevent you from recovering the correct parameters. For instance, Max-Cut is NP-hard, and the best we can do is a semidefinite programming approximation. Beyond a certain noise threshold, the gap between the SDP solution and the true solution becomes very large very quickly.
by bmordue on 5/2/16, 12:31 PM
by kasperset on 5/2/16, 3:06 PM
by popra on 5/2/16, 10:32 AM
by Wei-1 on 5/2/16, 3:42 PM
by leecarraher on 5/2/16, 2:14 PM
by amelius on 5/2/16, 12:54 PM
by chestervonwinch on 5/2/16, 12:52 PM
by dweinus on 5/2/16, 6:19 PM
by graycat on 5/2/16, 12:11 PM
Glenn W. Milligan
Ohio State
back to at least 1980.