ML2 ‐ Lec (6) - RenadShamrani/test GitHub Wiki

📉 Dimensionality Reduction

Definition: Reducing the number of features in a dataset while keeping important information.

Curse of Dimensionality

🔹 More dimensions require exponentially more data.
🔹 Not all features contribute useful information.
🔹 Example cases:

  • HD Images: Do all pixels matter?
  • Videos: Only key frames matter.
  • Text: Keywords are more important than all words.

🎯 Why Reduce Dimensions?

Faster Computation: Fewer dimensions → quicker models.
Better Visualization: We can’t visualize >3D data.
Remove Noise: Irrelevant/redundant features impact performance.


🏆 Types of Dimensionality Reduction

1️⃣ Feature Selection

👉 Select important features from the dataset.
👉 Methods:

  • Filter Methods (Statistical tests)
  • Wrapper Methods (Use ML models)
  • Embedded Methods (Built-in feature selection)

2️⃣ Feature Extraction

👉 Create new features that represent the data.
👉 Example: PCA (Principal Component Analysis).


📌 Feature Selection

✅ What are Features?

  • Features are attributes that define an instance (e.g., height, weight).
  • Features help classify instances into different classes.

🛠 Types of Features

Relevant: Directly impacts the output.
Irrelevant: No effect on output.
Redundant: Can be replaced by another feature.

🔍 Feature Selection Methods

  1. Filter Methods: Use statistical techniques before training.

    • LDA (Linear Discriminant Analysis)
    • ANOVA (Analysis of Variance)
    • Chi-Square Test
    • Pearson’s Correlation
  2. Wrapper Methods: Use a model to test different feature subsets.

    • Forward Selection (Start with nothing, add features)
    • Backward Elimination (Start with all, remove unnecessary ones)
    • Recursive Feature Elimination (RFE)

📌 Comparison of Methods

Method Pros Cons
Filter Fast, independent of model May keep redundant features
Wrapper Finds best feature subset Expensive, can overfit
Embedded Optimized during training May depend on specific model

🏗 Feature Extraction

👉 Instead of selecting features, create new features.

🔹 Principal Component Analysis (PCA)

Transforms high-dimensional data into fewer variables (Principal Components).
✔ Maximizes variance while minimizing information loss.
✔ Reduces correlated features into independent components.

📌 PCA Steps 1️⃣ Standardization: Normalize data.
2️⃣ Compute Covariance Matrix: Find feature relationships.
3️⃣ Find Eigenvalues & Eigenvectors: Identify principal components.
4️⃣ Sort Components: Rank based on importance.
5️⃣ Select Top Components: Keep only the most important ones.

📌 Formula for Eigenvalues
[ \lambda = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} ]

🚀 Key Takeaway: PCA helps simplify data while preserving patterns!


🔄 Comparison of Feature Selection vs Feature Extraction

Criteria Feature Selection 🏗 Feature Extraction 🛠
Approach Select important features Create new features
Output Subset of original features Transformed features
Common Methods LDA, Chi-Square PCA, t-SNE
Best For Removing redundancy Reducing complexity

🎯 Key Takeaways

Feature Selection → Remove irrelevant/redundant features.
Feature Extraction (PCA)Create new features while keeping essential information.
Use PCA when feature correlation is high and dimension reduction is needed.



1. Dimensionality Reduction 📉

  • What?: Reduce the number of features (dimensions) while retaining important information.
  • Why?:
    • Improve computation efficiency. ⚡
    • Simplify visualization. 📊
    • Remove irrelevant/redundant data. 🗑️

2. Feature Selection 🎯

  • What?: Select a subset of original features.
  • Methods:
    • Filter Methods: Use statistical measures (e.g., correlation, mutual information).
    • Wrapper Methods: Use a model to evaluate feature subsets (e.g., forward selection, backward elimination).
    • Embedded Methods: Feature selection during model training (e.g., Lasso, Ridge).

3. Feature Extraction 🛠️

  • What?: Create new features from original ones.
  • PCA (Principal Component Analysis):
    • Transforms data into uncorrelated components (PCs).
    • Steps:
      1. Standardize data.
      2. Compute covariance matrix.
      3. Find eigenvalues and eigenvectors.
      4. Select top PCs (highest variance).
    • Goal: Retain maximum variance with fewer dimensions.

4. PCA Steps 📝

  1. Standardize Data:
    Z = \frac{X - \mu}{\sigma}
    
  2. Covariance Matrix:
    \text{Cov}(X, Y) = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n-1}
    
  3. Eigenvalues & Eigenvectors:
    • Solve:
      \text{det}(\text{Covariance Matrix} - \lambda I) = 0
      
  4. Select Top PCs: Keep PCs with highest eigenvalues (variance).

5. Key Concepts 🔑

  • Eigenvalues: Measure of variance captured by each PC.
  • Eigenvectors: Directions of maximum variance.
  • Covariance Matrix: Shows how features vary together.
  • Mutual Information (MI): Measures dependency between features and target.

Mind Map 🧠

Dimensionality Reduction
├── Feature Selection
│   ├── Filter Methods (e.g., MI, Correlation)
│   ├── Wrapper Methods (e.g., Forward Selection)
│   └── Embedded Methods (e.g., Lasso)
└── Feature Extraction
    ├── PCA (Principal Component Analysis)
    │   ├── Standardize Data
    │   ├── Covariance Matrix
    │   ├── Eigenvalues & Eigenvectors
    │   └── Select Top PCs
    └── Other Methods (e.g., t-SNE, UMAP)

Key Symbols 🔑

  • X: Original data.
  • Z: Standardized data.
  • λ: Eigenvalues.
  • v: Eigenvectors.
  • Cov: Covariance matrix.

You’re ready! 🎉 Just remember PCA = reduce dimensions, Eigenvalues = variance, and Feature Selection = pick best features! 🚀