ML2 ‐ Equations - RenadShamrani/test GitHub Wiki

1. Entropy 📉

What?: Measures uncertainty or randomness in a dataset.
Formula:
```
Entropy(S) = -\sum_{i=1}^c p_i \log_2(p_i)
```
- p_i: Proportion of class i in the dataset.
- c: Number of classes.
Steps to Solve:
1. Calculate the proportion of each class in the dataset (p_i).
2. Use the calculator to compute log₂(p_i).
3. Multiply each p_i by its log₂(p_i).
4. Sum the results for all classes.
5. Multiply the sum by -1 to get the entropy.

2. Information Gain (IG) 📈

What?: Measures the reduction in entropy after splitting the dataset.

Formula:

IG(S, A) = Entropy(S) - \sum_{v \in \text{values}(A)} \frac{|S_v|}{|S|} Entropy(S_v)

S: Dataset.
A: Attribute.
S_v: Subset of data where attribute A has value v.

Steps to Solve:
1. Calculate the entropy of the entire dataset (Entropy(S)).
2. Split the dataset based on attribute A.
3. Calculate the entropy for each subset (Entropy(S_v)).
4. Compute the weighted sum of subset entropies.
5. Subtract the weighted sum from the original entropy.

3. Gini Index 🎯

What?: Measures impurity in a dataset.
Formula:
```
Gini(S) = 1 - \sum_{i=1}^c p_i^2
```
- p_i: Proportion of class i in the dataset.
- c: Number of classes.
Steps to Solve:
1. Calculate the proportion of each class (p_i).
2. Square each proportion using the calculator.
3. Sum the squared proportions.
4. Subtract the sum from 1.

4. Weighted Gini Index ⚖️

What?: Measures impurity after splitting the dataset.

Formula:

Gini_{split}(S, A) = \sum_{v \in \text{values}(A)} \frac{|S_v|}{|S|} Gini(S_v)

S: Dataset.
A: Attribute.
S_v: Subset of data where attribute A has value v.

Steps to Solve:
1. Split the dataset based on attribute A.
2. Calculate the Gini Index for each subset (Gini(S_v)).
3. Compute the weighted sum of subset Gini Indices.

5. Principal Component Analysis (PCA) 📊

What?: Reduces dimensionality by transforming data into uncorrelated components.
Steps to Solve:
1. Standardize Data:
```
Z = \frac{X - \mu}{\sigma}
```
2. Compute Covariance Matrix:
```
\text{Cov}(X, Y) = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n-1}
```
3. Find Eigenvalues & Eigenvectors: Solve:
```
\text{det}(\text{Covariance Matrix} - \lambda I) = 0
```
4. Select Top PCs: Keep eigenvectors with the highest eigenvalues.

6. K-Means Clustering 🔢

What?: Partitional clustering algorithm.
Steps to Solve:
1. Randomly initialize k centroids.
2. Assign each point to the nearest centroid.
3. Recalculate centroids as the mean of assigned points.
4. Repeat until convergence (no change in centroids).

7. Gaussian Mixture Model (GMM) 🎲

What?: Probabilistic model representing data as a mixture of Gaussians.
Formula:
```
p(x) = \sum_{i=1}^k \pi_i N(x | \mu_i, \Sigma_i)
```
- π_i: Mixing weight.
- μ_i: Mean of component i.
- Σ_i: Covariance of component i.

8. Expectation-Maximization (EM) Algorithm 🔄

What?: Iterative algorithm to estimate GMM parameters.

Steps to Solve:

E-Step (Expectation): Compute responsibilities:

\tau(z_{nk}) = \frac{\pi_k N(x_n | \mu_k, \Sigma_k)}{\sum_{j=1}^k \pi_j N(x_n | \mu_j, \Sigma_j)}

M-Step (Maximization): Update parameters:

\mu_k^{new} = \frac{\sum_{n=1}^N \tau(z_{nk}) x_n}{N_k}

\Sigma_k^{new} = \frac{1}{N_k} \sum_{n=1}^N \tau(z_{nk}) (x_n - \mu_k^{new})(x_n - \mu_k^{new})^T

\pi_k^{new} = \frac{N_k}{N}

Key Equations Summary 🔑

Concept	Equation
Entropy	`Entropy(S) = -∑ p_i log₂(p_i)`
Information Gain	`IG(S, A) = Entropy(S) - ∑ (
Gini Index	`Gini(S) = 1 - ∑ p_i²`
Weighted Gini	`Gini_split(S, A) = ∑ (
PCA	`Cov(X, Y) = ∑ (X_i - X̄)(Y_i - Ȳ) / (n-1)`
K-Means	Assign points to nearest centroid, recalculate centroids.
GMM	`p(x) = ∑ π_i N(x
EM Algorithm	E-Step: Compute responsibilities. M-Step: Update parameters.

Mind Map 🧠

Machine Learning Equations
├── Entropy (Uncertainty)
├── Information Gain (Reduction in Entropy)
├── Gini Index (Impurity)
├── Weighted Gini (Impurity after Split)
├── PCA (Dimensionality Reduction)
├── K-Means (Clustering)
├── GMM (Mixture of Gaussians)
└── EM Algorithm (Parameter Estimation)

How to Solve Equations 🛠️

Entropy:
- Calculate proportions (p_i).
- Use the calculator to compute log₂(p_i).
- Multiply each p_i by its log₂(p_i).
- Sum the results for all classes.
- Multiply the sum by -1 to get the entropy.
Information Gain:
- Calculate entropy before and after split.
- Subtract weighted sum of subset entropies.
Gini Index:
- Calculate proportions (p_i).
- Square each proportion using the calculator.
- Sum the squared proportions.
- Subtract the sum from 1.
Weighted Gini:
- Split dataset, calculate Gini for each subset.
- Compute weighted sum.
PCA:
- Standardize data, compute covariance matrix.
- Find eigenvalues/eigenvectors, select top PCs.
K-Means:
- Initialize centroids, assign points, recalculate centroids.
GMM:
- Model data as a mixture of Gaussians.
EM Algorithm:
- E-Step: Compute responsibilities.
- M-Step: Update parameters.