Decision Tree in Python : Flu Classification - clumsyspeedboat/Decision-Tree-Neo4j GitHub Wiki

Python code

R code


Decision Tree

We have implemented the following algorithms of Decision Trees for comparison of accuracy

  • CART
  • C 4.5
  • C 5.0

Acknowledgement

https://github.com/yoshihiko1218/COVID19ML

The dataset, initially consisting of 1485 instances * 51 variables were reduced down to 13 variables. There are many null values in the dataset which is hard to handle in a dataset where symptoms variables are mostly input as strings. We got rid of most of the string variables with a lot of null values. Even after cutting the dataset down to 13 variables, there were still a lot of null values remaining. We have replaced all the null values

  • with "0" in the numeric variables
  • with "unknown" in the string variables

We have taken (3/4)th of our Flu Classification Dataset to be used as Training Dataset & (1/3)rd to be used as Testing Dataset


Gini Index

  • depth = 16
  • leaf nodes = 105

Confusion Matrix: Prediction on Test Dataset

Accuracy = 86.6 %


Information Gain (Entropy)

  • depth = 23
  • leaf nodes = 106

Confusion Matrix: Prediction on Test Dataset

Accuracy = 90.9%