ML2 ‐ Lec (6) - RenadShamrani/test GitHub Wiki
📉 Dimensionality Reduction
Definition: Reducing the number of features in a dataset while keeping important information.
❌ Curse of Dimensionality
🔹 More dimensions require exponentially more data.
🔹 Not all features contribute useful information.
🔹 Example cases:
- HD Images: Do all pixels matter?
- Videos: Only key frames matter.
- Text: Keywords are more important than all words.
🎯 Why Reduce Dimensions?
✔ Faster Computation: Fewer dimensions → quicker models.
✔ Better Visualization: We can’t visualize >3D data.
✔ Remove Noise: Irrelevant/redundant features impact performance.
🏆 Types of Dimensionality Reduction
1️⃣ Feature Selection
👉 Select important features from the dataset.
👉 Methods:
- Filter Methods (Statistical tests)
- Wrapper Methods (Use ML models)
- Embedded Methods (Built-in feature selection)
2️⃣ Feature Extraction
👉 Create new features that represent the data.
👉 Example: PCA (Principal Component Analysis).
📌 Feature Selection
✅ What are Features?
- Features are attributes that define an instance (e.g., height, weight).
- Features help classify instances into different classes.
🛠 Types of Features
✔ Relevant: Directly impacts the output.
✔ Irrelevant: No effect on output.
✔ Redundant: Can be replaced by another feature.
🔍 Feature Selection Methods
-
Filter Methods: Use statistical techniques before training.
- LDA (Linear Discriminant Analysis)
- ANOVA (Analysis of Variance)
- Chi-Square Test
- Pearson’s Correlation
-
Wrapper Methods: Use a model to test different feature subsets.
- Forward Selection (Start with nothing, add features)
- Backward Elimination (Start with all, remove unnecessary ones)
- Recursive Feature Elimination (RFE)
📌 Comparison of Methods
Method | Pros | Cons |
---|---|---|
Filter | Fast, independent of model | May keep redundant features |
Wrapper | Finds best feature subset | Expensive, can overfit |
Embedded | Optimized during training | May depend on specific model |
🏗 Feature Extraction
👉 Instead of selecting features, create new features.
🔹 Principal Component Analysis (PCA)
✔ Transforms high-dimensional data into fewer variables (Principal Components).
✔ Maximizes variance while minimizing information loss.
✔ Reduces correlated features into independent components.
📌 PCA Steps
1️⃣ Standardization: Normalize data.
2️⃣ Compute Covariance Matrix: Find feature relationships.
3️⃣ Find Eigenvalues & Eigenvectors: Identify principal components.
4️⃣ Sort Components: Rank based on importance.
5️⃣ Select Top Components: Keep only the most important ones.
📌 Formula for Eigenvalues
[
\lambda = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
]
🚀 Key Takeaway: PCA helps simplify data while preserving patterns!
🔄 Comparison of Feature Selection vs Feature Extraction
Criteria | Feature Selection 🏗 | Feature Extraction 🛠 |
---|---|---|
Approach | Select important features | Create new features |
Output | Subset of original features | Transformed features |
Common Methods | LDA, Chi-Square | PCA, t-SNE |
Best For | Removing redundancy | Reducing complexity |
🎯 Key Takeaways
✅ Feature Selection → Remove irrelevant/redundant features.
✅ Feature Extraction (PCA) → Create new features while keeping essential information.
✅ Use PCA when feature correlation is high and dimension reduction is needed.
1. Dimensionality Reduction 📉
- What?: Reduce the number of features (dimensions) while retaining important information.
- Why?:
- Improve computation efficiency. ⚡
- Simplify visualization. 📊
- Remove irrelevant/redundant data. 🗑️
2. Feature Selection 🎯
- What?: Select a subset of original features.
- Methods:
- Filter Methods: Use statistical measures (e.g., correlation, mutual information).
- Wrapper Methods: Use a model to evaluate feature subsets (e.g., forward selection, backward elimination).
- Embedded Methods: Feature selection during model training (e.g., Lasso, Ridge).
3. Feature Extraction 🛠️
- What?: Create new features from original ones.
- PCA (Principal Component Analysis):
- Transforms data into uncorrelated components (PCs).
- Steps:
- Standardize data.
- Compute covariance matrix.
- Find eigenvalues and eigenvectors.
- Select top PCs (highest variance).
- Goal: Retain maximum variance with fewer dimensions.
4. PCA Steps 📝
- Standardize Data:
Z = \frac{X - \mu}{\sigma}
- Covariance Matrix:
\text{Cov}(X, Y) = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n-1}
- Eigenvalues & Eigenvectors:
- Solve:
\text{det}(\text{Covariance Matrix} - \lambda I) = 0
- Solve:
- Select Top PCs: Keep PCs with highest eigenvalues (variance).
5. Key Concepts 🔑
- Eigenvalues: Measure of variance captured by each PC.
- Eigenvectors: Directions of maximum variance.
- Covariance Matrix: Shows how features vary together.
- Mutual Information (MI): Measures dependency between features and target.
Mind Map 🧠
Dimensionality Reduction
├── Feature Selection
│ ├── Filter Methods (e.g., MI, Correlation)
│ ├── Wrapper Methods (e.g., Forward Selection)
│ └── Embedded Methods (e.g., Lasso)
└── Feature Extraction
├── PCA (Principal Component Analysis)
│ ├── Standardize Data
│ ├── Covariance Matrix
│ ├── Eigenvalues & Eigenvectors
│ └── Select Top PCs
└── Other Methods (e.g., t-SNE, UMAP)
Key Symbols 🔑
X
: Original data.Z
: Standardized data.λ
: Eigenvalues.v
: Eigenvectors.Cov
: Covariance matrix.
You’re ready! 🎉 Just remember PCA = reduce dimensions, Eigenvalues = variance, and Feature Selection = pick best features! 🚀