ICP 4 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki
Python and Deep Learning: Special Topics
Rajeshwari Sai Aishwarya Puppala
Student ID: 16298162
Class ID: 35
In class programming: 4
Objectives:
-
Find the correlation between ‘survived’(target column) and ‘sex’ column for the Titanic use case in class. Do you think we should keep this feature?
-
Implement Naïve Bayes method using scikit-learn library Use dataset available in
https://umkc.box.com/s/ea6wn1cidukan67t02j60nmp1ljln3kd Use train_test_splitto create training and testing part
-
Implement linear SVM method using scikit library Use the same data set above
Use train_test_splitto create training and testing part
Evaluate the model on testing part using score and Which algorithm you got better accuracy? Can you justify why?
Finding the correlation
- Imported the data using pathlib and pandas and created a data frame
- Selected the Survived and Sex columns from the data and grouped by sex to find out the higher correlation.
- We can see that Female category has a higher value
Code
Output
Naive Bayes
- Imported the required libraries i.e GaussianNB from Naive_bayes, pandas, accuracy_score, pathlib
- Created a data frame with the data set glass.csv
- Divided the data set into the dependent variable and independent variables
- The dependent variable is "Type"
- Divide the data set into X_train, X_test, y_train, and y_test with the 30% test data and 70% train data
- Initialize the GaussianNB classifier
- Fit the train data to the classifier
- Predict the independent variable for both the train and test data
- Now calculate the accuracy_score by giving the pred and true values of the train and test data
Code
Output
SVM - Linear
- Imported the required libraries i.e SVC, pandas, accuracy_score, pathlib
- Created a data frame with the dataset glass.csv
- Divided the dataset into the dependent variable and independent variables
- The dependent variable is "Type"
- Divide the data set into X_train, X_test, y_train, and y_test with the 30% test data and 70% train data
- Initialize the SVM classifier With "Linear Kernel"
- Fit the train data to the classifier
- Predict the independent variable for both the train and test data
- Now calculate the accuracy_score by giving the pred and true values of the train and test data
** Code**
Output