ML2 ‐ Lec (3) - RenadShamrani/test GitHub Wiki

1. Decision Trees 🌳

What?: A tree-like model for classification/regression.
Goal: Build the smallest possible tree that fits the data.
Nodes: Test attributes.
Branches: Attribute values.
Leaves: Class labels or predictions.

2. ID3 Algorithm 🛠️

Steps:
1. Start at the root.
2. Choose the best attribute (max info gain).
3. Split data based on attribute values.
4. Repeat for each branch.
Stopping Criteria:
- All examples in a branch are the same class.
- No more attributes to split.
- Assign majority class if no data.

3. Entropy & Information Gain 📊

Entropy: Measures impurity/uncertainty.
```
Entropy(S) = -p_+ \log_2 p_+ - p_- \log_2 p_-
```
- p_+: Proportion of positive examples.
- p_-: Proportion of negative examples.

Information Gain:

Gain(S, A) = Entropy(S) - \sum_{v} \frac{|S_v|}{|S|} Entropy(S_v)

A: Attribute.
S_v: Subset of data for value v.

4. Overfitting & Pruning ✂️

Overfitting: Tree too complex → fits noise.
Pruning:
- Pre-pruning: Stop early (e.g., min samples per leaf).
- Post-pruning: Grow full tree, then remove nodes.
Goal: Simplify tree to improve generalization.

5. Extensions 🔄

Continuous Attributes: Discretize using thresholds.
Missing Values: Use most frequent value or probability estimates.
Cost-Sensitive Attributes: Modify gain to account for feature costs.
Regression Trees: Predict numeric values (average in leaves).

6. Key Concepts 🔑

Gini Index: Alternative to entropy for impurity.
```
Gini(S) = 1 - \sum p_i^2
```
Gain Ratio: Adjusts info gain to penalize many-valued attributes.
```
GainRatio(S, A) = \frac{Gain(S, A)}{SplitInformation(S, A)}
```
Multivariate Trees: Use linear combinations of attributes.

Mind Map 🧠

Decision Trees
├── ID3 Algorithm
│   ├── Entropy (impurity measure)
│   ├── Information Gain (choose best attribute)
│   └── Stopping Criteria (pure branch, no attributes)
├── Overfitting
│   ├── Pre-pruning (stop early)
│   └── Post-pruning (grow full, then cut)
└── Extensions
    ├── Continuous Attributes (discretize)
    ├── Missing Values (use most frequent)
    ├── Regression Trees (predict numeric values)
    └── Multivariate Trees (linear combinations)

Key Symbols 🔑

S: Dataset.
A: Attribute.
p_+: Proportion of positive examples.
p_-: Proportion of negative examples.
Gain(S, A): Information gain for attribute A.
Gini(S): Gini impurity for dataset S.

You’re ready! 🎉 Just remember Decision Trees = split data based on attributes, Entropy = measure of impurity, and Pruning = avoid overfitting! 🚀

1. Decision Trees Extensions 🌳

Gain Ratio: Adjusts info gain to penalize attributes with many values.
```
GainRatio(S, A) = \frac{Gain(S, A)}{SplitInformation(S, A)}
```
Continuous Attributes: Discretize using thresholds (e.g., Temperature > 54).
Missing Values: Use most frequent value or probability estimates.
Cost-Sensitive Attributes: Modify gain to account for feature costs.
```
Gain2(S, A) = \frac{Gain(S, A)^2}{Cost(S, A)}
```

2. Multiclass Classification 🎯

Entropy for Multiple Classes:
```
Entropy(S) = -\sum_{i=1}^c p_i \log_2 p_i
```
- c: Number of classes.
- p_i: Proportion of class i.

3. Regression Trees 📈

Goal: Predict continuous values.
Splitting Criterion: Minimize variance (standard deviation reduction).
```
SDR(S, A) = SD(S) - \sum_{v} \frac{|S_v|}{|S|} SD(S_v)
```
Prediction: Mean value in leaf nodes.

4. CART (Classification and Regression Trees) 🛠️

Gini Index: Measures impurity.
```
Gini(S) = 1 - \sum_{i=1}^c p_i^2
```

Weighted Gini:

Gini_{split} = \frac{N_1}{N} Gini(S_1) + \frac{N_2}{N} Gini(S_2)

Regression: Use Mean Squared Error (MSE) for splitting.
```
MSE = \frac{1}{N} \sum (y_i - \hat{y})^2
```

5. Key Concepts 🔑

Gain Ratio: Penalizes attributes with many values.
Continuous Attributes: Discretize using thresholds.
Missing Values: Use most frequent value or probability estimates.
Regression Trees: Predict numeric values (mean in leaves).
CART: Uses Gini Index for classification, MSE for regression.

Mind Map 🧠

Decision Trees Extensions
├── Gain Ratio (penalize many-valued attributes)
├── Continuous Attributes (discretize using thresholds)
├── Missing Values (use most frequent or probability)
├── Cost-Sensitive Attributes (modify gain with cost)
├── Multiclass Classification (entropy for multiple classes)
└── Regression Trees
    ├── Splitting Criterion (minimize variance)
    ├── Prediction (mean in leaves)
    └── CART (Gini Index for classification, MSE for regression)

Key Symbols 🔑

S: Dataset.
A: Attribute.
Gain(S, A): Information gain for attribute A.
Gini(S): Gini impurity for dataset S.
MSE: Mean Squared Error (for regression).

You’re ready! 🎉 Just remember Decision Trees = split data based on attributes, Gain Ratio = penalize many-valued attributes, and Regression Trees = predict numeric values! 🚀