6. Stability
Similarity Index (Lange et al, 2004) indicates the percentage of pairs of observations that belong to the same
cluster in both clustering C and clustering C’.
7. Cluster Integrity – Heterogeneity
Total separation of clusters: based on the distance between cluster centers
8. Cluster Integrity - Homogeneity
Scatter (compactness): average ratio of the cluster variance to the variance of the dataset.
9. Accuracy
Reality Prediction
5 5
6
4 6 4
2
1 2 1
3 7 7
3
8 8
9 9
Adjusted Rand Index (Hubert and Arabie, 1985): level of agreement between the predicted segment and the real
segment correcting for the expected level of agreement.
22. Anita Prinzie, Nicole Huyghe
anita@solutions2.be
www.solutions2.be
do we cause
risingquestions
23. References
• Fred and Jain, Combining Multiple Clustering using Evidence
Accumulation (2005), IEEE Transactions on Pattern analysis and
Machine Intelligence, 27(6), 835-850.
• Lange, T., Roth., V., Braun L. And Buhmann J.M. (2004) , Stability-
based validation of Clustering Solutions, Neural Computation, 16,
1299-1323.
• Haldiki, M.,Vazirgiannis M. and Batistakis, Y. (2000), Quality Scheme
Assessment in the Clustering Process, Proc. Of the 4th European
Conference on Principles of Data Mining and Knowledge
Discovery, 265-276.
• Hubert, L. And Arabie, P. (1985) Comparing partitions, Journal of
Classification, 193-218.
• Nieweglowski, L., CLV package (2007), R software.
• Martin, A., Quinn, K.M. And Park, J.H., Markov Chain Monte Carlo
Package (MCMCpack) (2003-2012), R software.