k means clustering algorithm - rugbyprof/5443-Data-Mining GitHub Wiki
KMeans:
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
Compare K-means and GMM:
KMeans:
1. Objective function:§Minimize the TSD
2. Can be optimized by an EM algorithm.
§E-step: assign points to clusters.
§M-step: optimize clusters.
§Performs hard assignment during E-step.
3. Assumes spherical clusters with equal probability of a cluster.
GMM:
1. Objective function:§Maximize the log-likelihood.
2. EM algorithm:
§E-step: Compute posterior probability of membership.
§M-step: Optimize parameters.
§Perform soft assignment during E-step.
3. Can be used for non-sphericalclusters. Can generate clusterswith different probabilities.