ML2 ‐ Lec (2) - RenadShamrani/test GitHub Wiki

1. Cross-Validation 🔄

  • What?: Divides data into blocks (e.g., 4-fold or 10-fold) to test model performance.
  • Why?: Avoids bias from a single train-test split.
  • How?: Each block is used as a test set once, and results are averaged.

2. Underfitting / Overfitting / Best Fit 📉📈

  • Underfitting: Model too simple → can’t capture data trend. 🚫
  • Overfitting: Model too complex → captures noise. 📊
  • Best Fit: Model captures trend moderately. ✅

3. Bias vs. Variance ⚖️

  • Bias: Error from wrong assumptions (e.g., straight line for curved data). 📏
  • Variance: Error from sensitivity to small changes in data. 📊
  • Tradeoff: Simple models = high bias, low variance. Complex models = low bias, high variance.

4. Support Vector Classifier (SVC) 🎯

  • What?: Finds the best hyperplane to separate classes.
  • Margin: Distance between hyperplane and closest points (support vectors). 📏
  • Soft Margin: Allows misclassifications to handle outliers. 🛠️
  • Objective: Maximize margin → Minimize ||w|| (normal vector).

5. SVM Objective Function 🎯

  • Formula:
    \text{Minimize: } \frac{1}{2} ||w||^2
    
    \text{Constraints: } y_i(w \cdot x_i + b) \geq 1
    
  • Lagrangian:
    L_P = \frac{1}{2} ||w||^2 - \sum \alpha_i y_i (x_i \cdot w + b) + \sum \alpha_i
    

6. Kernels 🌀

  • What?: Transform data into higher dimensions to make it linearly separable.
  • Types: Linear, Polynomial, Gaussian (RBF), etc.
  • Why?: Handles non-linear data by mapping it to a space where a hyperplane can separate it.

7. Decision Rule

  • Formula:
    w \cdot u + b \geq 0
    
    • w: Normal vector.
    • u: New sample.
    • b: Bias term.

8. Key Concepts 🔑

  • Support Vectors: Points closest to the hyperplane. 🎯
  • Hyperplane: Decision boundary (line in 2D, plane in 3D, etc.). 📏
  • Soft Margin: Allows some misclassifications for better generalization. 🛠️

Mind Map 🧠

SVM
├── Cross-Validation (4-fold, 10-fold)
├── Underfitting / Overfitting / Best Fit
├── Bias vs. Variance
├── SVC
│   ├── Hyperplane (maximize margin)
│   ├── Soft Margin (handle outliers)
│   └── Support Vectors (closest points)
├── Kernels (linear, RBF, polynomial)
└── Decision Rule (w · u + b ≥ 0)

Key Symbols 🔑

  • w: Normal vector (hyperplane direction).
  • b: Bias term (shifts hyperplane).
  • α: Lagrange multiplier.
  • x_i: Data points.
  • y_i: Labels (+1 or -1).

You’re ready! 🎉 Just remember SVM = maximize margin, soft margin = handle outliers, and kernels = handle non-linear data! 🚀


1. Hinge Loss Function 📉

  • What?: Measures how far predictions are from true values.
  • Formula:
    L(y, f(x)) = \max(0, 1 - y \cdot f(x))
    
    • y: Actual class (-1 or 1).
    • f(x): Model’s prediction.
  • Cases:
    • Correct Classification: Loss = 0 ✅
    • Incorrect Classification: Loss = 1 - y·f(x) ❌

2. Slack Variables (ξ) 🛠️

  • What?: Allow misclassifications to handle outliers.
  • Conditions:
    • 0 < ξ ≤ 1: Correct but within margin.
    • ξ > 1: Misclassified.
  • Constraints:
    y_i(w \cdot x_i + b) \geq 1 - \xi_i
    

3. Soft Margin SVM 🎯

  • What?: Allows some misclassifications for better generalization.
  • Objective Function:
    \min_{w, \xi} \frac{1}{2} ||w||^2 + C \sum \xi_i
    
    • C: Regularization parameter.
      • Small C: Large margin, more misclassifications.
      • Large C: Small margin, fewer misclassifications.

4. Kernel Trick 🌀

  • What?: Transforms data into higher dimensions without explicit computation.
  • Kernel Functions:
    • Polynomial:
      K(x_i, x_j) = (x_i \cdot x_j + r)^d
      
    • RBF (Gaussian):
      K(x_i, x_j) = \exp\left(-\gamma ||x_i - x_j||^2\right)
      
    • Sigmoid:
      K(x_i, x_j) = \tanh(\eta x_i \cdot x_j + \nu)
      

5. Gamma (γ) in RBF 🎚️

  • What?: Controls the influence of each training example.
    • Low γ: Smooth decision boundary (underfitting). 📏
    • High γ: Tight decision boundary (overfitting). 📊

6. SVM Hyperparameters 🔧

  • C (Penalty Parameter):
    • Large C: Small margin, strict classification.
    • Small C: Large margin, lenient classification.
  • Kernel:
    • Polynomial, RBF, Sigmoid.
  • Gamma (γ):
    • Controls the reach of each data point.

7. Key Concepts 🔑

  • Support Vectors: Points closest to the hyperplane. 🎯
  • Hyperplane: Decision boundary (line in 2D, plane in 3D, etc.). 📏
  • Kernel Trick: Avoids explicit high-dimensional transformation. 🌀

Mind Map 🧠

SVM
├── Hinge Loss (max(0, 1 - y·f(x)))
├── Slack Variables (ξ)
│   ├── 0 < ξ ≤ 1: Correct but within margin
│   └── ξ > 1: Misclassified
├── Soft Margin
│   ├── Objective: min ½||w||² + C∑ξ
│   └── C: Regularization (small = large margin, large = small margin)
├── Kernel Trick
│   ├── Polynomial: (x_i·x_j + r)^d
│   ├── RBF: exp(-γ||x_i - x_j||²)
│   └── Sigmoid: tanh(ηx_i·x_j + ν)
└── Gamma (γ)
    ├── Low γ: Smooth boundary (underfit)
    └── High γ: Tight boundary (overfit)

Key Symbols 🔑

  • w: Normal vector (hyperplane direction).
  • b: Bias term (shifts hyperplane).
  • ξ: Slack variable (allows misclassifications).
  • C: Regularization parameter.
  • γ: Gamma (controls RBF kernel reach).

You’re ready! 🎉 Just remember SVM = maximize margin, soft margin = handle outliers, and kernels = handle non-linear data! 🚀