ICP5 - PallaviArikatla/Python GitHub Wiki

OBJECTIVE: Execute programs using Regression Model.

Software used: PyCharm, Python 3/ Python 2.

QUESTION 1: Delete all the outlierdata for the GarageArea field (for the same data set in the use case: House Prices).

  • Initialize by importing all the libraries that are used in plotting the graphs.

  • Read the given dataset.

  • Plot a scatter graph for the actual dataset and plot against GarageArea and SalePrice.

  • Now filter the data by making assumptions looking at the graph obtained or from the actual dataset to exclude outliers.

  • Plot another scatter graph for the filtered data and print GarageArea and SalePrice values.

CODE:

OUTPUT FOR ACTUAL DATASET:

OUTPUT FOR FILTER DATA:

PRINTED VALUES:

QUESTION 2: Create Multiple Regression for the “wine quality” dataset. In this data set “quality” is the target label.Evaluate the model using RMSE and R2 score.

  • Here we'll be working on another dataset called winequality in which our target is quality.

  • Obtain numeric features of the given dataset.

  • As correlation is our target, find it's correlation.

  • Delete Null values and remove missing values using following commands accordingly:

nulls = pd.DataFrame(train.isnull().sum().sort_values(ascending=False)[:25])

data = train.select_dtypes(include=[np.number]).interpolate().dropna()

  • Split the given data into train and test and do the evaluation part.

  • Evaluation is done using Root Mean Square Error(RMSE) and R*2 score.

  • Closeness of RMSE value to 1 says how accurate our model is.

CODE:

OUTPUT: