Module 1_ICP 4: Machine Learning with Scikit Learn - acikgozmehmet/PythonDeepLearning GitHub Wiki
Machine Learning with Scikit-Learn
Objectives:
The following topics are covered.
- Machine Learning
- Classification
- Scientific packages
Overview
a. Machine Learning algorithms
b. Classification algorithm
c. Scikit learn
d. Advanced concept related to machine learning algorithm like overfitting, underfitting, cross validation, evaluation for clustering methods
In Class Programming
1. Find the correlation between ‘survived’ (target column) and ‘sex’ column for the Titanic use case in class.
Click here to get the source code
Do you think we should keep this feature?
When we see the correlation between gender and survival, there is a significant correlation between Sex->1 and survival rate. So, this feature has to be kept for modelling.
2. Implement Naïve Bayes method using scikit-learn library
-
Use dataset available in https://umkc.box.com/s/ea6wn1cidukan67t02j60nmp1ljln3kd
-
Use train_test_split to create training and testing part
-
Evaluate the model on testing part using score and classification_report(y_true, y_pred)
Click here to get the source code
3. Implement linear SVM method using scikit library
- Use the same dataset above
- Use train_test_split to create training and testing part
- Evaluate the model on testing part using score and classification_report(y_true, y_pred)
Click here to get the source code
Which algorithm you got better accuracy? Can you justify why?
When we check out the accuracy rates of the algorithms, we see that Naïve Bayes algorithms has higher accuracy than Linear SVM. This is normal,because of the nature of algorithms and the data we have. The results also depend on number of the data set available used in the modelling (totally 224, training set 149, test set 65) and number of the classes (0,1,2,3,4,5,6,7).
References
https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/