Decision Tree in R : Flu Classification - clumsyspeedboat/Decision-Tree-Neo4j GitHub Wiki

R code

Python code


Decision Tree

We have implemented the following algorithms of Decision Trees for comparison of accuracy

  • CART
  • C 4.5
  • C 5.0

Acknowledgement

https://github.com/yoshihiko1218/COVID19ML

The dataset, initially consisting of 1485 instances * 51 variables were reduced down to 13 variables. There are many null values in the dataset which is hard to handle in a dataset where symptoms variables are mostly input as strings. We got rid of most of the string variables with a lot of null values. Even after cutting the dataset down to 13 variables, there were still a lot of null values remaining. We have replaced all the null values

  • with "0" in the numeric variables
  • with "unknown" in the string variables

We have taken (3/4)th of our Flu Classification Dataset to be used as Training Dataset & (1/3)rd to be used as Testing Dataset


CART - Classification & Regression Trees

  • depth = 8
  • leaf nodes = 15

WhatsApp Image 2021-07-29 at 03 19 56

Confusion Matrix: Prediction on Test Dataset

Accuracy = 91.9 %


Decision Tree (C 4.5)

  • depth = 10
  • leaf nodes = 24

WhatsApp Image 2021-07-29 at 03 23 06

Confusion Matrix: Prediction on Test Dataset

Accuracy = 92.2 %


Decision Tree (C 5.0)

  • depth = 12
  • leaf nodes = 26

WhatsApp Image 2021-07-29 at 03 25 05

Confusion Matrix: Prediction on Test Dataset

Accuracy = 93.8 %