ICP5 - PallaviArikatla/Python GitHub Wiki
OBJECTIVE: Execute programs using Regression Model.
Software used: PyCharm, Python 3/ Python 2.
QUESTION 1: Delete all the outlierdata for the GarageArea field (for the same data set in the use case: House Prices).
-
Initialize by importing all the libraries that are used in plotting the graphs.
-
Read the given dataset.
-
Plot a scatter graph for the actual dataset and plot against GarageArea and SalePrice.
-
Now filter the data by making assumptions looking at the graph obtained or from the actual dataset to exclude outliers.
-
Plot another scatter graph for the filtered data and print GarageArea and SalePrice values.
CODE:
OUTPUT FOR ACTUAL DATASET:
OUTPUT FOR FILTER DATA:
PRINTED VALUES:
QUESTION 2: Create Multiple Regression for the “wine quality” dataset. In this data set “quality” is the target label.Evaluate the model using RMSE and R2 score.
-
Here we'll be working on another dataset called winequality in which our target is quality.
-
Obtain numeric features of the given dataset.
-
As correlation is our target, find it's correlation.
-
Delete Null values and remove missing values using following commands accordingly:
nulls = pd.DataFrame(train.isnull().sum().sort_values(ascending=False)[:25])
data = train.select_dtypes(include=[np.number]).interpolate().dropna()
-
Split the given data into train and test and do the evaluation part.
-
Evaluation is done using Root Mean Square Error(RMSE) and R*2 score.
-
Closeness of RMSE value to 1 says how accurate our model is.
CODE:
OUTPUT: