Clustering - HestiaProject/PAxSPL GitHub Wiki

Definition:

Group features based on their dependencies.

Variations:

Agglomerative Hierarchical Clustering (AHC):

A "bottom up" approach where each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

Divisive Hierarchical Clustering (DHC):

A "top down" approach where all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Priority Order:

Group > Extraction > Categorize

Inputs:

Outputs:

  • Feature tree
  • Feature clusters
  • Dendrogram tree.

Examples:

Tools:

  • Cluster 3.0;
  • Pycluster;
  • C Clustering Library; More information here

Related Techniques:

Recommended situations:

Clustering is highly recommended in products that possesses high level of dependencies between feature implementations. Besides that, a good documentation is not required when applying this technique.

Not Recommended situations:

As an static analysis technique, clustering may be unable to find all elements related to the same feature when applied in source code where the implementation of a feature is spread across several modules.