by fagnerbrack on 12/26/24, 11:04 PM with 21 comments
by throw_pm23 on 12/27/24, 12:08 AM
- scale-invariance: stretching data along some dimensions should not change clustering.
This is clearly not true: . . . (three well-spaced spots) may be reasonably seen as three clusters, whereas ||| (three nearby elongated bars) not.
- richness: all groupings must be reachable.
Also not quite true, both of the two cases: (1) all clusters are singleton points and (2) a single cluster that contains all points, mean the same: no useful cluster structure found. So it is enough if one of these groupings are reachable, and not both.
- consistency: increasing inter-cluster differences and decreasing intra-cluster differences should not change clustering.
Also not quite true: suppose we have 9 clusters:
. . .
. . .
. . .
now move the points so that the columns get further apart, at some point we will get:
| | |, where 3 clusters are more reasonable.by monkeyjoe on 12/27/24, 2:30 PM
by joe_the_user on 12/26/24, 11:49 PM
by Xcelerate on 12/27/24, 1:14 PM
Now if the goal is a quick prototype or to get an intuitive sense of the structure of the data, then sure, it’s fine.
But of course you’re always sacrificing something desirable when you try to shoehorn data into a model that doesn’t fit.
by keithalewis on 12/27/24, 9:24 AM
by piker on 12/27/24, 9:56 AM
[1] https://academic.oup.com/jrsssb/article-abstract/63/2/411/70...
by jpcom on 12/27/24, 12:54 AM