Clustering Algorithms - rugbyprof/5443-Data-Mining GitHub Wiki
Clustering is a process which partitions a given data set into homogeneous groups based on given features such that similar objects are kept in a group whereas dissimilar objects are in different groups. It is the most important unsupervised learning problem. It deals with finding structure in a collection of unlabeled data. For clustering algorithm to be advantageous and beneficial some of the conditions need to be satisfied.
- Scalability - Data must be scalable otherwise we may get the wrong result.
- Clustering algorithm must be able to deal with different types of attributes.
- Clustering algorithm must be able to find clustered data with the arbitrary shape.
- Clustering algorithm must be insensitive to noise and outliers.
- Interpret-ability and Usability - Result obtained must be interpretable and usable so that maximum knowledge about the input parameters can be obtained.
- Clustering algorithm must be able to deal with data set of high dimensionality.
(https://sites.google.com/site/dataclusteringalgorithms/home)