ICP 4 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki

Python and Deep Learning: Special Topics

Rajeshwari Sai Aishwarya Puppala

Student ID: 16298162

Class ID: 35

In class programming: 4

Objectives:

  1. Find the correlation between ‘survived’(target column) and ‘sex’ column for the Titanic use case in class. Do you think we should keep this feature?

  2. Implement Naïve Bayes method using scikit-learn library Use dataset available in

    https://umkc.box.com/s/ea6wn1cidukan67t02j60nmp1ljln3kd Use train_test_splitto create training and testing part

  3. Implement linear SVM method using scikit library Use the same data set above

    Use train_test_splitto create training and testing part

    Evaluate the model on testing part using score and Which algorithm you got better accuracy? Can you justify why?

Finding the correlation

  • Imported the data using pathlib and pandas and created a data frame
  • Selected the Survived and Sex columns from the data and grouped by sex to find out the higher correlation.
  • We can see that Female category has a higher value

Code

Output

Naive Bayes

  • Imported the required libraries i.e GaussianNB from Naive_bayes, pandas, accuracy_score, pathlib
  • Created a data frame with the data set glass.csv
  • Divided the data set into the dependent variable and independent variables
  • The dependent variable is "Type"
  • Divide the data set into X_train, X_test, y_train, and y_test with the 30% test data and 70% train data
  • Initialize the GaussianNB classifier
  • Fit the train data to the classifier
  • Predict the independent variable for both the train and test data
  • Now calculate the accuracy_score by giving the pred and true values of the train and test data

Code

Output

SVM - Linear

  • Imported the required libraries i.e SVC, pandas, accuracy_score, pathlib
  • Created a data frame with the dataset glass.csv
  • Divided the dataset into the dependent variable and independent variables
  • The dependent variable is "Type"
  • Divide the data set into X_train, X_test, y_train, and y_test with the 30% test data and 70% train data
  • Initialize the SVM classifier With "Linear Kernel"
  • Fit the train data to the classifier
  • Predict the independent variable for both the train and test data
  • Now calculate the accuracy_score by giving the pred and true values of the train and test data

** Code**

Output

Which algorithm you got better accuracy?

We can conclude that SVM with the Linear Kernel has better accuracy compared to Naive Bayes.