Decision Tree - lorenzodelmedico/AI-knowledge GitHub Wiki
Supervised machine learning algorithm (classification & regression)
Follows a set of if-else conditions to classify the data based on conditions.
Important terminology:
- Root Node: This attribute is used for dividing the data into two or more sets. The feature attribute in this node is selected based on Attribute Selection Techniques.
- Branch or Sub-Tree: A part of the entire decision tree is called a branch or sub-tree.
- Splitting: Dividing a node into two or more sub-nodes based on if-else conditions.
- Decision Node: After splitting the sub-nodes into further sub-nodes, then it is called the decision node.
- Leaf or Terminal Node: This is the end of the decision tree where it cannot be split into further sub-nodes.
- Pruning: Removing a sub-node from the tree is called pruning.
How it works:
Root nood is selected based on results from Attribute Selection Measure (such as Gini index, IG, Gain ratio).
Additional information:
Information gain/gain ratio are suitable for discrete data, while Gini index is often used for continuous data. Can overfit on the data and is very sensitive to change in dataset. It's good to capture non-linear patterns in a dataset.
Target variable
Classification is used for discrete while Regression is used for continuous target.
Source:
https://www.youtube.com/watch?v=ZVR2Way4nwQ&t=157s
https://towardsai.net/p/programming/decision-trees-explained-with-a-practical-example-fe47872d3b53