python lab 2 - nimms9/CSEE5590_Python_ICP GitHub Wiki

Python and Deep Learning

Lab Assignment-2

TEAM ID :8

*VIDEO 🔗 : https://drive.google.com/open?id=1tYUE3xBI1vcLOAXXxmB6bfatGTo93RWs

NAME 1: Nimmagadda Sudheer

ID: 17

NAME 2: Uppalapati Adisekhar

ID: 21

NAME 3: Mandava Mahesh

ID: 15

Introduction

  • This lab assignment is about using classification, clustering, regression and nlp(nltk) on different datasets.
  • we observe accuracy for classification, clustering, and regression methods.

Objective

  • Our main objective in this assignment is to use a dataset and split it into train and test apart. And use specific classification/clustering/regression method to train the dataset to fit it to the particular corresponding model.
  • Finally we measure the accuracy for each of the method.

Approaches

  • Given in the lab assignment for classification we have used algorithms like Naive Baye's, SVM, KNN on a chosen dataset and reported the accuracy.
  • For clustering we used KMeans on a chosen dataset and visualized the clusters.
  • For the nlp question we used nltk and performed different operations like Tokenization, Lemmatization, Trigrams on the given input file.
  • For regression we performed multiple linear regression on a chosen dataset and reported RMSE, R^2.

Workflow

  • Team member-1: completed question 1, half part in 3rd question and video.
  • Team member-2: completed question 2 and other half part in 3rd question.
  • Team member-3: completed question 4 and documentation.

Datasets

Evaluation and Discussion

QUESTION-1

  • We used one hot encoding for categorical features and mean technique for handling null Values.
  • SVM classifier got more accuracy than other two classifiers for post operative patient data.

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/1.JPG?raw=true

QUESTION-2

  • We used elbow method to find optimal number of clusters and evaluated the silhouette score.
  • we visualized the clusters using matplot lib

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/2.JPG?raw=true

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/2a.JPG?raw=true

QUESTION-3

  • Initially we read data from file and tokenized the text into words and applied lemmatization on each word.
  • we also extracted top 10 trigrams based on their count and displayed the sentences with the most repeated trigrams.

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/3b.JPG?raw=true

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/3c.JPG?raw=true

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/3d.JPG?raw=true

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/3f.JPG?raw=true

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/3h.JPG?raw=true

QUESTION-4

  • we used multiple regression model and used one hot encoding on the chosen dataset and reported the RMSE and R2

https://github.com/nimms9/CSEE5590_Python_ICP/blob/master/Lab-2/Documentation/4.JPG?raw=true

  • REFERENCES
  1. https://stackoverflow.com/questions/tagged/python

2.https://docs.python.org/3/

                                                     THE END