python icp5 - koushikskr/python GitHub Wiki
Introduction:
This ICP is about usage of Regression Model and there types.We have also took the example of food and milk yield and compared the relationship between cows food intake and Milk yield.
Question 1:
Delete all the outlierdata for the GarageArea field (for the same data set in the use case: House Prices).* for this task you need to plot GaurageArea field and SalePrice in scatter plot, then check which numbers are anomalies.
Solution steps:
Imported pandas and red the given dataset and created data frame.
imported matplot library for plotting the scatter plot.
Plotted scatter plot with the columns GarageArea and salePrice.
Now removed outlier data considering garage area with less than 1000 and greater than 400 and SalePrice less than 500000 and created new dataframe.
Plotted scatter plot with the newly created data frame after removing outlier data.
Question 2:
Create Multiple Regression for the “wine quality” dataset. In this data set “quality” is the target label.Evaluate the model using RMSE and R2 score. You need to delete the null values in the data set and You need to find the top 3 most correlated features to the target label(quality)
Solution steps:
Imported given dataset and created data frame out of it.
Got the numeric_features from the data frame.
Found the correlation by taking quality column.
Deleted null values by using isnull() method
Removed missing values using interpolate() method.
Created model for linear regression.
Splitted the data as train and test data.
Trained the model with the train data.
Found R2 score and RMSE score.