python lab 2 - nimms9/CSEE5590_Python_ICP GitHub Wiki
Python and Deep Learning
Lab Assignment-2
TEAM ID :8
*VIDEO 🔗 : https://drive.google.com/open?id=1tYUE3xBI1vcLOAXXxmB6bfatGTo93RWs
NAME 1: Nimmagadda Sudheer
ID: 17
NAME 2: Uppalapati Adisekhar
ID: 21
NAME 3: Mandava Mahesh
ID: 15
Introduction
- This lab assignment is about using classification, clustering, regression and nlp(nltk) on different datasets.
- we observe accuracy for classification, clustering, and regression methods.
Objective
- Our main objective in this assignment is to use a dataset and split it into train and test apart. And use specific classification/clustering/regression method to train the dataset to fit it to the particular corresponding model.
- Finally we measure the accuracy for each of the method.
Approaches
- Given in the lab assignment for classification we have used algorithms like Naive Baye's, SVM, KNN on a chosen dataset and reported the accuracy.
- For clustering we used KMeans on a chosen dataset and visualized the clusters.
- For the nlp question we used nltk and performed different operations like Tokenization, Lemmatization, Trigrams on the given input file.
- For regression we performed multiple linear regression on a chosen dataset and reported RMSE, R^2.
Workflow
- Team member-1: completed question 1, half part in 3rd question and video.
- Team member-2: completed question 2 and other half part in 3rd question.
- Team member-3: completed question 4 and documentation.
Datasets
Evaluation and Discussion
QUESTION-1
- We used one hot encoding for categorical features and mean technique for handling null Values.
- SVM classifier got more accuracy than other two classifiers for post operative patient data.
QUESTION-2
- We used elbow method to find optimal number of clusters and evaluated the silhouette score.
- we visualized the clusters using matplot lib
QUESTION-3
- Initially we read data from file and tokenized the text into words and applied lemmatization on each word.
- we also extracted top 10 trigrams based on their count and displayed the sentences with the most repeated trigrams.
QUESTION-4
- we used multiple regression model and used one hot encoding on the chosen dataset and reported the RMSE and R2
- REFERENCES
THE END