ICP4 - adrian6912/CS490PythonML GitHub Wiki

My code for this is located at the bottom of the "Preprocessing-EDA.py" file in the ICP4 folder on my github

The correlation between 'Sex' and 'Survived' for the preprocessed training data set is ~0.54. This is good evidence of a correlation between the two features. Therefore, we should keep the feature 'Sex' in our models.

2, 3, 4. Read in the data as a pandas data frame, separate it into a single target (Type) and it's associated features (everything else). Then split the two sets into training and test sets. Then create the Naive Bayesian model, the SVM model with kernel=poly, and SVM model with kernel=rbf. The results for these models are extremely varied, with the GNB doing the worst as it averages around .45. The two SVMs are wildly different after each run, but the rbf gives the most stable accuracy.