ML2 ‐ Lec (2) - RenadShamrani/test GitHub Wiki
1. Cross-Validation 🔄
- What?: Divides data into blocks (e.g., 4-fold or 10-fold) to test model performance.
- Why?: Avoids bias from a single train-test split.
- How?: Each block is used as a test set once, and results are averaged.
2. Underfitting / Overfitting / Best Fit 📉📈
- Underfitting: Model too simple → can’t capture data trend. 🚫
- Overfitting: Model too complex → captures noise. 📊
- Best Fit: Model captures trend moderately. ✅
3. Bias vs. Variance ⚖️
- Bias: Error from wrong assumptions (e.g., straight line for curved data). 📏
- Variance: Error from sensitivity to small changes in data. 📊
- Tradeoff: Simple models = high bias, low variance. Complex models = low bias, high variance.
4. Support Vector Classifier (SVC) 🎯
- What?: Finds the best hyperplane to separate classes.
- Margin: Distance between hyperplane and closest points (support vectors). 📏
- Soft Margin: Allows misclassifications to handle outliers. 🛠️
- Objective: Maximize margin → Minimize
||w||
(normal vector).
5. SVM Objective Function 🎯
- Formula:
\text{Minimize: } \frac{1}{2} ||w||^2
\text{Constraints: } y_i(w \cdot x_i + b) \geq 1
- Lagrangian:
L_P = \frac{1}{2} ||w||^2 - \sum \alpha_i y_i (x_i \cdot w + b) + \sum \alpha_i
6. Kernels 🌀
- What?: Transform data into higher dimensions to make it linearly separable.
- Types: Linear, Polynomial, Gaussian (RBF), etc.
- Why?: Handles non-linear data by mapping it to a space where a hyperplane can separate it.
7. Decision Rule ✅
- Formula:
w \cdot u + b \geq 0
w
: Normal vector.u
: New sample.b
: Bias term.
8. Key Concepts 🔑
- Support Vectors: Points closest to the hyperplane. 🎯
- Hyperplane: Decision boundary (line in 2D, plane in 3D, etc.). 📏
- Soft Margin: Allows some misclassifications for better generalization. 🛠️
Mind Map 🧠
SVM
├── Cross-Validation (4-fold, 10-fold)
├── Underfitting / Overfitting / Best Fit
├── Bias vs. Variance
├── SVC
│ ├── Hyperplane (maximize margin)
│ ├── Soft Margin (handle outliers)
│ └── Support Vectors (closest points)
├── Kernels (linear, RBF, polynomial)
└── Decision Rule (w · u + b ≥ 0)
Key Symbols 🔑
w
: Normal vector (hyperplane direction).b
: Bias term (shifts hyperplane).α
: Lagrange multiplier.x_i
: Data points.y_i
: Labels (+1 or -1).
You’re ready! 🎉 Just remember SVM = maximize margin, soft margin = handle outliers, and kernels = handle non-linear data! 🚀
1. Hinge Loss Function 📉
- What?: Measures how far predictions are from true values.
- Formula:
L(y, f(x)) = \max(0, 1 - y \cdot f(x))
y
: Actual class (-1 or 1).f(x)
: Model’s prediction.
- Cases:
- Correct Classification: Loss = 0 ✅
- Incorrect Classification: Loss = 1 - y·f(x) ❌
2. Slack Variables (ξ) 🛠️
- What?: Allow misclassifications to handle outliers.
- Conditions:
0 < ξ ≤ 1
: Correct but within margin.ξ > 1
: Misclassified.
- Constraints:
y_i(w \cdot x_i + b) \geq 1 - \xi_i
3. Soft Margin SVM 🎯
- What?: Allows some misclassifications for better generalization.
- Objective Function:
\min_{w, \xi} \frac{1}{2} ||w||^2 + C \sum \xi_i
C
: Regularization parameter.- Small C: Large margin, more misclassifications.
- Large C: Small margin, fewer misclassifications.
4. Kernel Trick 🌀
- What?: Transforms data into higher dimensions without explicit computation.
- Kernel Functions:
- Polynomial:
K(x_i, x_j) = (x_i \cdot x_j + r)^d
- RBF (Gaussian):
K(x_i, x_j) = \exp\left(-\gamma ||x_i - x_j||^2\right)
- Sigmoid:
K(x_i, x_j) = \tanh(\eta x_i \cdot x_j + \nu)
- Polynomial:
5. Gamma (γ) in RBF 🎚️
- What?: Controls the influence of each training example.
- Low γ: Smooth decision boundary (underfitting). 📏
- High γ: Tight decision boundary (overfitting). 📊
6. SVM Hyperparameters 🔧
- C (Penalty Parameter):
- Large C: Small margin, strict classification.
- Small C: Large margin, lenient classification.
- Kernel:
- Polynomial, RBF, Sigmoid.
- Gamma (γ):
- Controls the reach of each data point.
7. Key Concepts 🔑
- Support Vectors: Points closest to the hyperplane. 🎯
- Hyperplane: Decision boundary (line in 2D, plane in 3D, etc.). 📏
- Kernel Trick: Avoids explicit high-dimensional transformation. 🌀
Mind Map 🧠
SVM
├── Hinge Loss (max(0, 1 - y·f(x)))
├── Slack Variables (ξ)
│ ├── 0 < ξ ≤ 1: Correct but within margin
│ └── ξ > 1: Misclassified
├── Soft Margin
│ ├── Objective: min ½||w||² + C∑ξ
│ └── C: Regularization (small = large margin, large = small margin)
├── Kernel Trick
│ ├── Polynomial: (x_i·x_j + r)^d
│ ├── RBF: exp(-γ||x_i - x_j||²)
│ └── Sigmoid: tanh(ηx_i·x_j + ν)
└── Gamma (γ)
├── Low γ: Smooth boundary (underfit)
└── High γ: Tight boundary (overfit)
Key Symbols 🔑
w
: Normal vector (hyperplane direction).b
: Bias term (shifts hyperplane).ξ
: Slack variable (allows misclassifications).C
: Regularization parameter.γ
: Gamma (controls RBF kernel reach).
You’re ready! 🎉 Just remember SVM = maximize margin, soft margin = handle outliers, and kernels = handle non-linear data! 🚀