EM (Expectation Maximization) - SoojungHong/TextMining GitHub Wiki

EM algorithm

In statistics, the EM algorithm iterates and optimizes the likelihood of seeing observed data while estimating the parameters of a statistical model with unobserved variables.

Why similar with clustering

By optimizing the likelihood, EM generates a model that assigns class labels to data points — sounds like clustering to me!

Actually EM algorithm is one of known clustering algorithm.

Well known Clustering algorithms

k-Means

Hierarchical Cluster Analysis (HCA)

Expectation Maximization

EM algorithm steps

EM begins by making a guess at the model parameters.

Then it follows an iterative 3-step process:

  1. E-step: Based on the model parameters, it calculates the probabilities for assignments of each data point to a cluster.
  2. M-step: Update the model parameters based on the cluster assignments from the E-step.
  3. Repeat until the model parameters and cluster assignments stabilize (a.k.a. convergence).