Decision Tree in Python : Heart Failure Prediction - clumsyspeedboat/Decision-Tree-Neo4j GitHub Wiki

Python code

R code

Data Analysis Results


Decision Tree

We have implemented the following criteria to split nodes in our Decision Trees for comparison of accuracy

  • Gini Index
  • Information Gain (Entropy)

We have taken (2/3)rd of our Heart Failure Prediction Dataset to be used as Training Dataset & (1/3)rd to be used as Testing Dataset


Gini Index

  • depth = 6
  • leaf nodes = 16

Confusion Matrix: Prediction on Test Dataset

Accuracy = 72/97 = 74.23 %


Information Gain (Entropy)

  • depth = 9
  • leaf nodes = 26

Confusion Matrix: Prediction on Test Dataset

Accuracy = 72/97 = 74.23 %


Pruning

We realized the necessity of pruning the decision trees in python as a tree with a high number of leaf nodes and depth would be prone to overfitting. A function was used to determine the optimal pruning value for the parameter "max_depth". The final pruned trees have been made comparable to the Decision Tree in R


Pruned Decision Tree: Gini Index

  • depth = 4
  • leaf nodes = 4

Accuracy = 81.4 %


Pruned Decision Tree: Information Gain (Entropy)

  • depth = 4
  • leaf nodes = 7

Accuracy = 78.4 %