ML2 ‐ Lec (2) - RenadShamrani/test GitHub Wiki

1. Cross-Validation 🔄

What?: Divides data into blocks (e.g., 4-fold or 10-fold) to test model performance.
Why?: Avoids bias from a single train-test split.
How?: Each block is used as a test set once, and results are averaged.

2. Underfitting / Overfitting / Best Fit 📉📈

Underfitting: Model too simple → can’t capture data trend. 🚫
Overfitting: Model too complex → captures noise. 📊
Best Fit: Model captures trend moderately. ✅

3. Bias vs. Variance ⚖️

Bias: Error from wrong assumptions (e.g., straight line for curved data). 📏
Variance: Error from sensitivity to small changes in data. 📊
Tradeoff: Simple models = high bias, low variance. Complex models = low bias, high variance.

4. Support Vector Classifier (SVC) 🎯

What?: Finds the best hyperplane to separate classes.
Margin: Distance between hyperplane and closest points (support vectors). 📏
Soft Margin: Allows misclassifications to handle outliers. 🛠️
Objective: Maximize margin → Minimize ||w|| (normal vector).

5. SVM Objective Function 🎯

Formula:

\text{Minimize: } \frac{1}{2} ||w||^2

\text{Constraints: } y_i(w \cdot x_i + b) \geq 1

Lagrangian:

L_P = \frac{1}{2} ||w||^2 - \sum \alpha_i y_i (x_i \cdot w + b) + \sum \alpha_i

6. Kernels 🌀

What?: Transform data into higher dimensions to make it linearly separable.
Types: Linear, Polynomial, Gaussian (RBF), etc.
Why?: Handles non-linear data by mapping it to a space where a hyperplane can separate it.

7. Decision Rule ✅

Formula:
```
w \cdot u + b \geq 0
```
- w: Normal vector.
- u: New sample.
- b: Bias term.

8. Key Concepts 🔑

Support Vectors: Points closest to the hyperplane. 🎯
Hyperplane: Decision boundary (line in 2D, plane in 3D, etc.). 📏
Soft Margin: Allows some misclassifications for better generalization. 🛠️

Mind Map 🧠

SVM
├── Cross-Validation (4-fold, 10-fold)
├── Underfitting / Overfitting / Best Fit
├── Bias vs. Variance
├── SVC
│   ├── Hyperplane (maximize margin)
│   ├── Soft Margin (handle outliers)
│   └── Support Vectors (closest points)
├── Kernels (linear, RBF, polynomial)
└── Decision Rule (w · u + b ≥ 0)

Key Symbols 🔑

w: Normal vector (hyperplane direction).
b: Bias term (shifts hyperplane).
α: Lagrange multiplier.
x_i: Data points.
y_i: Labels (+1 or -1).

You’re ready! 🎉 Just remember SVM = maximize margin, soft margin = handle outliers, and kernels = handle non-linear data! 🚀

1. Hinge Loss Function 📉

What?: Measures how far predictions are from true values.
Formula:
```
L(y, f(x)) = \max(0, 1 - y \cdot f(x))
```
- y: Actual class (-1 or 1).
- f(x): Model’s prediction.
Cases:
- Correct Classification: Loss = 0 ✅
- Incorrect Classification: Loss = 1 - y·f(x) ❌

2. Slack Variables (ξ) 🛠️

What?: Allow misclassifications to handle outliers.
Conditions:
- 0 < ξ ≤ 1: Correct but within margin.
- ξ > 1: Misclassified.
Constraints:
```
y_i(w \cdot x_i + b) \geq 1 - \xi_i
```

3. Soft Margin SVM 🎯

What?: Allows some misclassifications for better generalization.
Objective Function:
```
\min_{w, \xi} \frac{1}{2} ||w||^2 + C \sum \xi_i
```
- C: Regularization parameter.
  - Small C: Large margin, more misclassifications.
  - Large C: Small margin, fewer misclassifications.

4. Kernel Trick 🌀

What?: Transforms data into higher dimensions without explicit computation.

Kernel Functions:

Polynomial:
```
K(x_i, x_j) = (x_i \cdot x_j + r)^d
```

RBF (Gaussian):

K(x_i, x_j) = \exp\left(-\gamma ||x_i - x_j||^2\right)

Sigmoid:

K(x_i, x_j) = \tanh(\eta x_i \cdot x_j + \nu)

5. Gamma (γ) in RBF 🎚️

What?: Controls the influence of each training example.
- Low γ: Smooth decision boundary (underfitting). 📏
- High γ: Tight decision boundary (overfitting). 📊

6. SVM Hyperparameters 🔧

C (Penalty Parameter):
- Large C: Small margin, strict classification.
- Small C: Large margin, lenient classification.
Kernel:
- Polynomial, RBF, Sigmoid.
Gamma (γ):
- Controls the reach of each data point.

7. Key Concepts 🔑

Support Vectors: Points closest to the hyperplane. 🎯
Hyperplane: Decision boundary (line in 2D, plane in 3D, etc.). 📏
Kernel Trick: Avoids explicit high-dimensional transformation. 🌀

Mind Map 🧠

SVM
├── Hinge Loss (max(0, 1 - y·f(x)))
├── Slack Variables (ξ)
│   ├── 0 < ξ ≤ 1: Correct but within margin
│   └── ξ > 1: Misclassified
├── Soft Margin
│   ├── Objective: min ½||w||² + C∑ξ
│   └── C: Regularization (small = large margin, large = small margin)
├── Kernel Trick
│   ├── Polynomial: (x_i·x_j + r)^d
│   ├── RBF: exp(-γ||x_i - x_j||²)
│   └── Sigmoid: tanh(ηx_i·x_j + ν)
└── Gamma (γ)
    ├── Low γ: Smooth boundary (underfit)
    └── High γ: Tight boundary (overfit)

Key Symbols 🔑

w: Normal vector (hyperplane direction).
b: Bias term (shifts hyperplane).
ξ: Slack variable (allows misclassifications).
C: Regularization parameter.
γ: Gamma (controls RBF kernel reach).

You’re ready! 🎉 Just remember SVM = maximize margin, soft margin = handle outliers, and kernels = handle non-linear data! 🚀