Decision Tree in R : Flu Classification - clumsyspeedboat/Decision-Tree-Neo4j GitHub Wiki
R code
Python code
Decision Tree
We have implemented the following algorithms of Decision Trees for comparison of accuracy
- CART
- C 4.5
- C 5.0
Acknowledgement
https://github.com/yoshihiko1218/COVID19ML
The dataset, initially consisting of 1485 instances * 51 variables were reduced down to 13 variables. There are many null values in the dataset which is hard to handle in a dataset where symptoms variables are mostly input as strings. We got rid of most of the string variables with a lot of null values. Even after cutting the dataset down to 13 variables, there were still a lot of null values remaining. We have replaced all the null values
- with "0" in the numeric variables
- with "unknown" in the string variables
We have taken (3/4)th of our Flu Classification Dataset to be used as Training Dataset & (1/3)rd to be used as Testing Dataset
CART - Classification & Regression Trees
- depth = 8
- leaf nodes = 15
Confusion Matrix: Prediction on Test Dataset
Accuracy = 91.9 %
Decision Tree (C 4.5)
- depth = 10
- leaf nodes = 24
Confusion Matrix: Prediction on Test Dataset
Accuracy = 92.2 %
Decision Tree (C 5.0)
- depth = 12
- leaf nodes = 26
Confusion Matrix: Prediction on Test Dataset
Accuracy = 93.8 %