Decision Tree in Python : Heart Failure Prediction - clumsyspeedboat/Decision-Tree-Neo4j GitHub Wiki
Python code
R code
Data Analysis Results
Decision Tree
We have implemented the following criteria to split nodes in our Decision Trees for comparison of accuracy
- Gini Index
- Information Gain (Entropy)
We have taken (2/3)rd of our Heart Failure Prediction Dataset to be used as Training Dataset & (1/3)rd to be used as Testing Dataset
Gini Index
- depth = 6
- leaf nodes = 16
Confusion Matrix: Prediction on Test Dataset
Accuracy = 72/97 = 74.23 %
Information Gain (Entropy)
- depth = 9
- leaf nodes = 26
Confusion Matrix: Prediction on Test Dataset
Accuracy = 72/97 = 74.23 %
Pruning
We realized the necessity of pruning the decision trees in python as a tree with a high number of leaf nodes and depth would be prone to overfitting. A function was used to determine the optimal pruning value for the parameter "max_depth". The final pruned trees have been made comparable to the Decision Tree in R
Pruned Decision Tree: Gini Index
- depth = 4
- leaf nodes = 4
Accuracy = 81.4 %
Pruned Decision Tree: Information Gain (Entropy)
- depth = 4
- leaf nodes = 7
Accuracy = 78.4 %