Decision Tree - clumsyspeedboat/Decision-Tree-Neo4j GitHub Wiki

hunts (1)

Definition of Decision Tree and Tree Structure

Tree-like model is an abstract data type that stores elements hierarchically.

A tree consists of:

  • A set of nodes
  • A set of edges, each of which connects a pair of nodes

Relationships between nodes:

  • If a node is connected to other nodes that are directly below it in the tree, that node is referred to as their parent node and they are referred to as its children node.
  • Each node can have up most one parent node.
  • Nodes with the same parent are siblings

Type of nodes:

  • A leaf node is node without children
  • An interior node is a node with one or more children
  • Each node in the tree is the root of a smaller tree

Path, Depth, Level and Height:

  • There is exactly one path (one sequence of edges) connecting each node to the root.
  • Depth of a node = # of edges on the path from it to the root
  • Nodes with the same depth form a level of the tree.
  • The height of a tree is the maximum depth of its nodes.

A Decision Tree is the most powerful and popular tool for classification, prediction and decision making. It uses a tree-like model, where each internal node denotes a test on an attribute, each branch represent an outcome of the test and each leaf node (terminal node) holds a class label

source: "Decision Tree", GeeksforGeeks, https://www.geeksforgeeks.org/decision-tree/

Decision Tree in Machine Learning

2 important steps to create a Decision Tree models are:

  • Induction is where we actually build the tree i.e set all of the hierarchical decision boundaries based on our data. Because of the nature of training decision trees they can be prone to major overfitting.
  • Pruning is the process of removing the unnecessary structure from a decision tree, effectively reducing the complexity to combat overfitting with the added bonus of making it even easier to interpret.

Induction:

  1. Start with the training dataset, which should have some feature variables and classification or regression output.
  2. Determine the "best feature" in the dataset to split the data on.
  3. Split the data into subsets that contain the possible values for this best feature.
  4. Recursively generate new tree nodes by using the subset of data created from above. Keep splitting until reaching a point where obtained optimized.

Pruning:

  • Tree pruning is a technique that leverages this splitting redundancy to remove i.e prune the unnecessary splits in our tree.
  • pruning compresses part of the tree from strict and rigid decision boundaries into ones that are more smooth and generalise better, effectively reducing the tree complexity.The complexity of a decision tree is defined as the number of splits in the tree.
  • A simple yet highly effective pruning method is to go through each node in the tree and evaluate the effect of removing it on the cost function.

source: "A Guide to Decision Trees for Machine Learning and Data Science", George Seif, https://towardsdatascience.com/a-guide-to-decision-trees-for-machine-learning-and-data-science-fe2607241956

Types

Decision Tree

A Brief Demonstration: https://miro.medium.com/max/875/0*cant-HQdfMju-GxG