Likelihood (and EM algorithm) - SoojungHong/StatisticalMind GitHub Wiki

Likelihood

Given a set of measurable outcomes, let’s estimate the parameters. Using these estimated parameters, the hypothetical probability of the outcomes is called likelihood.

Parameters can be, in case of distribution, mean and variance of the distribution. Mean and Variance are found from given set and we can assume the distribution of dataset. With this information, we can find the likelihood which is the hypothetical probability of the outcome. (i.e. Likelihood can be represented using probability of outcome)

EM-algorithm

The EM algorithm iterates and optimizes the likelihood of seeing observed data while estimating the parameters of a statistical model with unobserved variables. Hopefully, this is way more understandable now.

The best part is…

By optimizing the likelihood, EM generates an awesome model that assigns class labels to data points — sounds like clustering to me!

EM algorithm steps

EM begins by making a guess at the model parameters.

Then it follows an iterative 3-step process:

E-step: Based on the model parameters, it calculates the probabilities for assignments of each data point to a cluster.
M-step: Update the model parameters based on the cluster assignments from the E-step.
Repeat until the model parameters and cluster assignments stabilize (a.k.a. convergence).