Decision Tree in Python : Flu Classification - clumsyspeedboat/Decision-Tree-Neo4j GitHub Wiki
Python code
R code
Decision Tree
We have implemented the following algorithms of Decision Trees for comparison of accuracy
- CART
- C 4.5
- C 5.0
Acknowledgement
https://github.com/yoshihiko1218/COVID19ML
The dataset, initially consisting of 1485 instances * 51 variables were reduced down to 13 variables. There are many null values in the dataset which is hard to handle in a dataset where symptoms variables are mostly input as strings. We got rid of most of the string variables with a lot of null values. Even after cutting the dataset down to 13 variables, there were still a lot of null values remaining. We have replaced all the null values
- with "0" in the numeric variables
- with "unknown" in the string variables
We have taken (3/4)th of our Flu Classification Dataset to be used as Training Dataset & (1/3)rd to be used as Testing Dataset
Gini Index
- depth = 16
- leaf nodes = 105
Confusion Matrix: Prediction on Test Dataset
Accuracy = 86.6 %
Information Gain (Entropy)
- depth = 23
- leaf nodes = 106
Confusion Matrix: Prediction on Test Dataset
Accuracy = 90.9%