Clustering Algorithms - rugbyprof/5443-Data-Mining GitHub Wiki

Clustering is a process which partitions a given data set into homogeneous groups based on given features such that similar objects are kept in a group whereas dissimilar objects are in different groups. It is the most important unsupervised learning problem. It deals with finding structure in a collection of unlabeled data. For clustering algorithm to be advantageous and beneficial some of the conditions need to be satisfied.

  1. Scalability - Data must be scalable otherwise we may get the wrong result.
  2. Clustering algorithm must be able to deal with different types of attributes.
  3. Clustering algorithm must be able to find clustered data with the arbitrary shape.
  4. Clustering algorithm must be insensitive to noise and outliers.
  5. Interpret-ability and Usability - Result obtained must be interpretable and usable so that maximum knowledge about the input parameters can be obtained.
  6. Clustering algorithm must be able to deal with data set of high dimensionality.

(https://sites.google.com/site/dataclusteringalgorithms/home)