Basics of Clustering
- Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups.
- Why is clustering difficult?
- Curse of dimensionality: Almost all pairs of points are at about the same distance in high-dimensional spaces.
- Clusters can be Ambiguous
Preliminary:
- similarity: Normally we use distance between vectors to measure the similarity.
- evaluation:
• Cluster Cohesion and Separation
• Silhouette coefficient – combine ideas of both cohesion and separation, but for individual points,as well as clusters