Cluster Validity Indices - tahiri-lab/KMeansPhyloTreesClustering GitHub Wiki
📊 Cluster Validity Indices (CH and BH)
In clustering, selecting the optimal number of clusters (K) is a critical step.
K-means requires a predefined number of clusters, but the correct value of K is usually unknown.
To solve this problem, cluster validity indices are used.
🔹 Calinski-Harabasz Index (CH)
The Calinski-Harabasz index evaluates clustering quality based on:
- Separation between clusters
- Compactness within clusters
Objective
Maximize:
Between-cluster variance / Within-cluster variance
Interpretation
- Higher CH value → better clustering
- Indicates well-separated and compact clusters
🔹 Ball-Hall Index (BH)
The Ball-Hall index focuses on:
- Compactness of clusters
Objective
Minimize:
- Average variance within clusters
Interpretation
- Lower BH value → better clustering
- Indicates tight and homogeneous clusters
🌳 CH and BH in This Project
In this project:
- K-means is executed for multiple values of K (from Kmin to Kmax)
- For each K, CH and BH indices are computed
- The optimal number of clusters is selected based on these indices
🎯 Role in the Workflow
CH and BH allow:
- Automatic selection of the best number of clusters
- Avoiding arbitrary choice of K
- Improving the reliability of clustering results