ICP4 - narhirep/Python-Deep-Learning GitHub Wiki
Welcome to the In Class Programming 4:
Description: This assignment is regarding various machine learning classification algorithms and uses of scikit learn package to solve various problems.
Objective: To solve a problem by implementing Naïve Bayes and SVM method using scikit learn library and find a correlation between features and target as well as finding out which algorithms provides better accuracy.
Implementation:
1. Find the correlation between ‘survived’(target column) and ‘sex’ column for the Titanic use case in class. Do you think we should keep this feature?
So I have imported pandas as pd and then mapped the column row values with numerical (0/1), and then found the correlation between survived column and sex column as shown in the below screenshot. Using this feature we can know how many males and females survived.
2. Implement Naïve Bayes method using scikit learn library. Use train_test_split to create training and testing part.Evaluate the model on test part using score and classification_report(y_true, y_pred).
To implement this imported required sklearn modules and then took 'type' column as predicted column i.e, y_train and remaining values as x_train, later by using test_split the data get divided into 2 parts by giving test size as 0.25, so that using GaussianNB model we trained 75% of data by naming as x_train and y_train, remaining 25% as x_test and y_test, so that we can predict y_test values by giving x_test values to model. Then by using metrics module and classification report we get details of 7 labels or classes of precision, recall etc.
3. Implement linear SVM method using scikit library. Use train_test_split to create training and testing partEvaluate the model on test part using score and classification_report(y_true, y_pred).
So by using linear kernel we got the same thing for SVM model but with more accuracy compare to Naïve Bayes. In SVM all features are related to each other, whereas in Naïve Bayes features are independent to each other that is why we get more accuracy for SVM model.
Video: ICP4 Link
Conclusion: In this ICP I have learned about machine learning classification algorithms, also use of scikit library which makes performing pretty easy as it saves more lines of code.