k means clustering algorithm - rugbyprof/5443-Data-Mining GitHub Wiki

KMeans:

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

Compare K-means and GMM:
KMeans:
1. Objective function:§Minimize the TSD
2. Can be optimized by an EM algorithm.
          §E-step: assign points to clusters.
          §M-step: optimize clusters.
          §Performs hard assignment during E-step.
3. Assumes spherical clusters with equal probability of a cluster.


GMM:
1. Objective function:§Maximize the log-likelihood.
2. EM algorithm:
          §E-step: Compute posterior probability of membership.
          §M-step: Optimize parameters.
          §Perform soft assignment during E-step.
3. Can be used for non-sphericalclusters. Can generate clusterswith different probabilities.