ML2 ‐ Lec (1) - RenadShamrani/test GitHub Wiki

1. Density Estimation 📊

  • Formula:

    p(x) = \frac{k}{NV}
    
    • k: # of points in volume V
    • N: Total # of points
    • V: Volume around x
  • Two Approaches:

    • KDE (Kernel Density Estimation) 🎯: Fix V, find k
    • kNN (k-Nearest Neighbors) 🎯: Fix k, find V

2. Kernel Density Estimation (KDE) 🧮

  • What?: Nonparametric method to estimate probability density.

  • How?: Uses a kernel function to weight nearby points.

  • Bandwidth (h):

    • Controls smoothness 🌀
    • Small h: Undersmoothing (spiky peaks) 📈
    • Large h: Oversmoothing (flat curve) 📉
  • Kernel Functions:

    • Gaussian:
      K(x; h) \propto \exp\left(-\frac{x^2}{2h^2}\right)
      
    • Tophat, Epanechnikov, Exponential, Linear, Cosine 📐
  • Steps:

    1. Draw a kernel (e.g., Gaussian) around each point.
    2. Sum all kernels and divide by N.

3. Bandwidth Selection 🎚️

  • Goal: Minimize error between estimated & true density.
  • Methods:
    • Rule of Thumb:
      h = \frac{\text{max} - \text{min}}{4}
      
    • Cross-Validation (CV): Optimize h using data.

4. Kernel Regression (KR) 📈

  • What?: Nonparametric regression using kernels.
  • Formula:
    f(x) = \frac{\sum_{i=1}^{N} \kappa_h(x - x_i) y_i}{\sum_{i=1}^{N} \kappa_h(x - x_i)}
    
    • Weights:
      w_i(x) = \frac{\kappa_h(x - x_i)}{\sum \kappa_h(x - x_i)}
      

5. k-Nearest Neighbors (kNN) 🎯

  • What?: Nonparametric method for regression/classification.
  • How?:
    • Regression: Average of k nearest points.
    • Classification: Majority vote of k nearest points.
  • Formula:
    \rho(y) = \frac{\text{Count of class in } k}{\text{Total } k}
    

6. Pros & Cons ⚖️

  • KDE:
    • Pros: No model fitting, flexible.
    • Cons: Memory-heavy, slow for large datasets.
  • kNN:
    • Pros: Simple, no training.
    • Cons: Stores all data, sensitive to k.

Key Symbols 🔑

  • h: Bandwidth (smoothing parameter)
  • k: # of neighbors (kNN) or points in V (KDE)
  • V: Volume around x
  • K: Kernel function

Mind Map 🧠

Density Estimation
├── KDE (fix V, find k)
│   ├── Bandwidth (h)
│   ├── Kernel Functions (Gaussian, Tophat, etc.)
│   └── Smoothness (small h = spiky, large h = flat)
└── kNN (fix k, find V)
    ├── Regression (average of k neighbors)
    └── Classification (majority vote)

You’re ready! 🎉 Just remember KDE = smooth with kernels, kNN = neighbors vote, and bandwidth controls smoothness! 🚀