Wiki Report for ICP5 - NagaSurendraBethapudi/Python-ICP GitHub Wiki

Video Link : https://drive.google.com/file/d/17VqrIgRD3GzDLyr7LuAw-7BMsBh6y1Gu/view?usp=sharing

Question 1 :

Delete all the outlier data for the GarageArea field

Explanation :

Imported Libraries : 1. import pandas as pd 2. import numpy as np 2. import matplotlib.pyplot as plt
Imported Data : https://umkc.box.com/s/mn4mjpsq0pf0ql7prhetu534cxbxnkcu
Found the quartiles using boxplots
Removed outliers using 1. np.percentile(data.GarageArea, 25) 2.np.percentile(data.GarageArea, 75) data[(data.GarageArea>334) & (data.GarageArea<576)]
Output :

Question 2 : Restaurant Revenue Prediction using datset: https://umkc.box.com/s/ac6vql1s466ss2b99ifetvh9g1yuj1uj

Explanation :

Imported Libraries : 1. import pandas as pd 2. import numpy as np 2. import matplotlib.pyplot as plt
Imported Data : https://umkc.box.com/s/ac6vql1s466ss2b99ifetvh9g1yuj1uj

Done basic analysis
Converted string to int

convert = {"City Group": {"Big Cities": 0, "Other": 1}, "Type" : {"FC" : 0, "IL" : 1, "DT" : 2}}

Splitting data into train and test
Evaluate the performace using R2 and RMSE errors

print ("R squared error : \n", model.score(x_test, y_test))
predictions = model.predict(x_test)
from sklearn.metrics import mean_squared_error
print ('RMSE error : \n', mean_squared_error(y_test, predictions))

Output :

Question 3 : Restaurant Revenue Prediction using datset: https://umkc.box.com/s/ac6vql1s466ss2b99ifetvh9g1yuj1uj with top most correlated features

Explanation :

Imported Libraries : 1. import pandas as pd 2. import numpy as np 2. import matplotlib.pyplot as plt
Imported Data : https://umkc.box.com/s/ac6vql1s466ss2b99ifetvh9g1yuj1uj

Done basic analysis
Converted string to int

convert = {"City Group": {"Big Cities": 0, "Other": 1}, "Type" : {"FC" : 0, "IL" : 1, "DT" : 2}}

Splitting data into train and test
Evaluate the performace using R2 and RMSE errors

print ("R squared error : \n", model.score(x_test, y_test))
predictions = model.predict(x_test)
from sklearn.metrics import mean_squared_error
print ('RMSE error : \n', mean_squared_error(y_test, predictions))

Found top correlated features using :

numeric_features = data.select_dtypes(include=[np.number])
corr = numeric_features.corr()
print (corr['revenue'].sort_values(ascending=False)[:6], '\n')
print (corr['revenue'].sort_values(ascending=False)[-6:])

Output :

Conclusion : By using topmost correlated features R^2 error was reduced from -0.66 to 0.05 and RMSE error also reduced from 0.457 t0 0.16

Learning :

learned about regressions
Converting string to int
Finding the correlation

Challenges :

Everything looks good.