ICP5 - adrian6912/CS490PythonML GitHub Wiki

Import the data from train.csv. Drop the positive-valued outliers by comparing values to the z-score of the GarageArea column, then drop the 0 valued outliers. Scatter plot original, scatter plot the 'no_outliers' data.
Import data. Drop any rows with 0 values. Find the correlation of all of the features. The highest correlation is alcohol, volatile acidity, and sulphates. Create a multiple regression model with all features, then one that only has the aforementioned features. Print out R-squared values and RMSE values for each model.