Wiki Report for ICP5 - NagaSurendraBethapudi/Python-ICP GitHub Wiki
https://drive.google.com/file/d/17VqrIgRD3GzDLyr7LuAw-7BMsBh6y1Gu/view?usp=sharing
Video Link :Question 1 :
Delete all the outlier data for the GarageArea field
Explanation :
- Imported Libraries :
1. import pandas as pd
2. import numpy as np
2. import matplotlib.pyplot as plt
- Imported Data :
https://umkc.box.com/s/mn4mjpsq0pf0ql7prhetu534cxbxnkcu
- Found the quartiles using boxplots
- Removed outliers using
1. np.percentile(data.GarageArea, 25)
2.np.percentile(data.GarageArea, 75)
data[(data.GarageArea>334) & (data.GarageArea<576)]
- Output :
https://umkc.box.com/s/ac6vql1s466ss2b99ifetvh9g1yuj1uj
Question 2 : Restaurant Revenue Prediction using datset:Explanation :
- Imported Libraries :
1. import pandas as pd
2. import numpy as np
2. import matplotlib.pyplot as plt
- Imported Data :
https://umkc.box.com/s/ac6vql1s466ss2b99ifetvh9g1yuj1uj
- Done basic analysis
- Converted string to int
convert = {"City Group": {"Big Cities": 0, "Other": 1}, "Type" : {"FC" : 0, "IL" : 1, "DT" : 2}}
- Splitting data into train and test
- Evaluate the performace using R2 and RMSE errors
-
print ("R squared error : \n", model.score(x_test, y_test))
-
predictions = model.predict(x_test)
-
from sklearn.metrics import mean_squared_error
-
print ('RMSE error : \n', mean_squared_error(y_test, predictions))
Output :
https://umkc.box.com/s/ac6vql1s466ss2b99ifetvh9g1yuj1uj with top most correlated features
Question 3 : Restaurant Revenue Prediction using datset:Explanation :
- Imported Libraries :
1. import pandas as pd
2. import numpy as np
2. import matplotlib.pyplot as plt
- Imported Data :
https://umkc.box.com/s/ac6vql1s466ss2b99ifetvh9g1yuj1uj
- Done basic analysis
- Converted string to int
convert = {"City Group": {"Big Cities": 0, "Other": 1}, "Type" : {"FC" : 0, "IL" : 1, "DT" : 2}}
- Splitting data into train and test
- Evaluate the performace using R2 and RMSE errors
-
print ("R squared error : \n", model.score(x_test, y_test))
-
predictions = model.predict(x_test)
-
from sklearn.metrics import mean_squared_error
-
print ('RMSE error : \n', mean_squared_error(y_test, predictions))
- Found top correlated features using :
-
numeric_features = data.select_dtypes(include=[np.number])
-
corr = numeric_features.corr()
-
print (corr['revenue'].sort_values(ascending=False)[:6], '\n')
-
print (corr['revenue'].sort_values(ascending=False)[-6:])
Output :
Conclusion : By using topmost correlated features R^2 error was reduced from -0.66 to 0.05 and RMSE error also reduced from 0.457 t0 0.16
Learning :
- learned about regressions
- Converting string to int
- Finding the correlation
Challenges :
- Everything looks good.