ICP 5 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki
Python and Deep Learning: Special Topics
Rajeshwari Sai Aishwarya Puppala
Student ID: 16298162
Class ID: 35
In class programming: 5
Objectives:
1.Delete all the outlier data for the GarageArea field (for the same data set in the use case: House Prices).
for this task you need to plot GaurageArea field and SalePrice in scatter plot, then check which numbers are anomalies.
2.Create Multiple Regression for the “wine quality” dataset. In this data set, “quality” is the target label. Evaluate the model using RMSE and R2 score.
**You need to delete the null values in the data set
**You need to find the top 3 most correlated features to the target label(quality)
Removing the Outliers
- Imported the required libraries pandas, matplotlib, pathlib
- Created a data frame with the dataset /train.csv
- Do the Scatter plot for the Garage values and the sales price to know the anomalies.
- See the plot most of the points are between 200 and 1000.
- Now remove the outliers which are below 200 and above 1000.
- Plot it after removing the outliers.
Code
Output
Garrage Values
With Anamolies
Without Anamolies
Multiple Regression- R2 and RSME Score
- Imported the required libraries pandas, matplotlib, pathlib
- Created a data frame with the dataset /winequality_red.csv
- Now find the top 3 correlated features with the target
- Check for the null values in the dataset and remove it from the data set
- Divided the dataset into the dependent variable and independent variables
- Divide the data set into X_train, X_test, y_train, and y_test with the 33% test data and 67% train data.
- Initialize the linear model.
- Fit the train data to the model.
- Now calculate the R^2 and RSME score for the model.
Code
Note The value of RMSE should be as minimal as possible and the value of R2 should be near to 1.