ML2 ‐ Lec (9) - RenadShamrani/test GitHub Wiki
1. Mixture Models 🎲
- What?: Probabilistic models that assume data is generated from multiple distributions (e.g., Gaussian).
- Key Components:
- Component Distributions: Individual probability distributions (e.g., Gaussian).
- Mixing Weights: Proportion of each component in the mixture.
2. Gaussian Mixture Model (GMM) 📊
- What?: A mixture model where each component is a Gaussian distribution.
- Formula:
p(x) = \sum_{i=1}^k \pi_i N(x | \mu_i, \Sigma_i)
π_i
: Mixing weight.μ_i
: Mean of componenti
.Σ_i
: Covariance of componenti
.
3. Expectation-Maximization (EM) Algorithm 🔄
- What?: Iterative algorithm to estimate parameters of mixture models.
- Steps:
- E-Step (Expectation): Compute responsibilities (probabilities of data points belonging to each cluster).
\tau(z_{nk}) = \frac{\pi_k N(x_n | \mu_k, \Sigma_k)}{\sum_{j=1}^k \pi_j N(x_n | \mu_j, \Sigma_j)}
- M-Step (Maximization): Update parameters (mean, covariance, mixing weights).
\mu_k^{new} = \frac{\sum_{n=1}^N \tau(z_{nk}) x_n}{N_k}
\Sigma_k^{new} = \frac{1}{N_k} \sum_{n=1}^N \tau(z_{nk}) (x_n - \mu_k^{new})(x_n - \mu_k^{new})^T
\pi_k^{new} = \frac{N_k}{N}
- E-Step (Expectation): Compute responsibilities (probabilities of data points belonging to each cluster).
4. Advantages of EM ✅
- Handles missing data.
- Robust to noise.
- Converges to a local maximum.
- Versatile for various ML tasks.
5. Disadvantages of EM ❌
- Sensitive to initial guesses.
- Slow for high-dimensional data.
- Computationally intensive.
Key Concepts 🔑
- GMM: Mixture of Gaussian distributions.
- EM Algorithm: Iterative parameter estimation.
- Responsibilities: Probabilities of data points belonging to clusters.
Mind Map 🧠
Mixture Models
├── Gaussian Mixture Model (GMM)
│ ├── Component Distributions (Gaussian)
│ └── Mixing Weights (π_i)
└── Expectation-Maximization (EM) Algorithm
├── E-Step (Compute Responsibilities)
└── M-Step (Update Parameters)
Key Symbols 🔑
π_i
: Mixing weight for componenti
.μ_i
: Mean of componenti
.Σ_i
: Covariance of componenti
.τ(z_{nk})
: Responsibility of data pointn
for clusterk
.
You’re ready! 🎉 Just remember GMM = mixture of Gaussians, EM = E-Step + M-Step, and Responsibilities = probabilities! 🚀