python lab 2 - nimms9/CSEE5590_Python_ICP GitHub Wiki

Python and Deep Learning

TEAM ID :8

NAME 1: Nimmagadda Sudheer

ID: 17

NAME 2: Uppalapati Adisekhar

ID: 21

NAME 3: Mandava Mahesh

ID: 15

This lab assignment is about using classification, clustering, regression and nlp(nltk) on different datasets.
we observe accuracy for classification, clustering, and regression methods.

Our main objective in this assignment is to use a dataset and split it into train and test apart. And use specific classification/clustering/regression method to train the dataset to fit it to the particular corresponding model.
Finally we measure the accuracy for each of the method.

Given in the lab assignment for classification we have used algorithms like Naive Baye's, SVM, KNN on a chosen dataset and reported the accuracy.
For clustering we used KMeans on a chosen dataset and visualized the clusters.
For the nlp question we used nltk and performed different operations like Tokenization, Lemmatization, Trigrams on the given input file.
For regression we performed multiple linear regression on a chosen dataset and reported RMSE, R^2.

QUESTION-1

We used one hot encoding for categorical features and mean technique for handling null Values.
SVM classifier got more accuracy than other two classifiers for post operative patient data.

QUESTION-2

We used elbow method to find optimal number of clusters and evaluated the silhouette score.
we visualized the clusters using matplot lib

QUESTION-3

Initially we read data from file and tokenized the text into words and applied lemmatization on each word.
we also extracted top 10 trigrams based on their count and displayed the sentences with the most repeated trigrams.

QUESTION-4

we used multiple regression model and used one hot encoding on the chosen dataset and reported the RMSE and R2

                                                     THE END