Clustering - HestiaProject/PAxSPL GitHub Wiki
Definition:
Group features based on their dependencies.
Variations:
Agglomerative Hierarchical Clustering (AHC):
A "bottom up" approach where each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive Hierarchical Clustering (DHC):
A "top down" approach where all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
Priority Order:
Group > Extraction > Categorize
Inputs:
Outputs:
- Feature tree
- Feature clusters
- Dendrogram tree.
Examples:
- (Weston et al. 2009)
- (Kelly et al. 2011)
- (Chen et al. 2005)
- (Bécan 2013)
- (Nöbauer et al. 2014b)
- (Rubin et al. 2012)
- (Damasevicius et al. 2012)
- (Eyal-Salman et al. 2014)
- (Alves et al. 2008)
Tools:
- Cluster 3.0;
- Pycluster;
- C Clustering Library; More information here
Related Techniques:
Recommended situations:
Clustering is highly recommended in products that possesses high level of dependencies between feature implementations. Besides that, a good documentation is not required when applying this technique.
Not Recommended situations:
As an static analysis technique, clustering may be unable to find all elements related to the same feature when applied in source code where the implementation of a feature is spread across several modules.