Module 1_ICP 4: Machine Learning with Scikit Learn - acikgozmehmet/PythonDeepLearning GitHub Wiki

Machine Learning with Scikit-Learn

Objectives:

The following topics are covered.

Machine Learning
Classification
Scientific packages

Overview

a. Machine Learning algorithms

b. Classification algorithm

c. Scikit learn

d. Advanced concept related to machine learning algorithm like overfitting, underfitting, cross validation, evaluation for clustering methods

In Class Programming

1. Find the correlation between ‘survived’ (target column) and ‘sex’ column for the Titanic use case in class.

Click here to get the source code

Do you think we should keep this feature?

When we see the correlation between gender and survival, there is a significant correlation between Sex->1 and survival rate. So, this feature has to be kept for modelling.

2. Implement Naïve Bayes method using scikit-learn library

Use dataset available in https://umkc.box.com/s/ea6wn1cidukan67t02j60nmp1ljln3kd
Use train_test_split to create training and testing part
Evaluate the model on testing part using score and classification_report(y_true, y_pred)

Click here to get the source code

3. Implement linear SVM method using scikit library

Use the same dataset above
Use train_test_split to create training and testing part
Evaluate the model on testing part using score and classification_report(y_true, y_pred)

Click here to get the source code

Which algorithm you got better accuracy? Can you justify why?

When we check out the accuracy rates of the algorithms, we see that Naïve Bayes algorithms has higher accuracy than Linear SVM. This is normal,because of the nature of algorithms and the data we have. The results also depend on number of the data set available used in the modelling (totally 224, training set 149, test set 65) and number of the classes (0,1,2,3,4,5,6,7).

References

https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/