Linear Regression - setiamanlhc/python-snippet-code GitHub Wiki
Create Dummy data
df_dummies = pd.get_dummies(df[cols_categorical_to_transform], drop_first=True)
df = pd.concat([df, df_dummies], axis=1)
Train Test Split
target = 'Height'
features = ['Father', 'Mother', 'Kids']
from sklearn import model_selection
x_train, x_test, y_train, y_test = model_selection.train_test_split(df[features], df[target], test_size=0.2, random_state=2020)
Import the specific model type
from sklearn import linear_model
from sklearn import metrics
Instantiate, Train, and Predict (3 Core Steps)
model = linear_model.LinearRegression()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
Evaluate the model
rmse = np.sqrt(metrics.mean_squared_error(y_test, predictions))
print("RMSE = " + str(np.round(rmse, 2)) + "cm")
print("Roughly about 70% of the predictions are between -" + str(np.round(rmse, 2)) + "cm and +" + str(np.round(rmse, 2)) + "cm of actual height")
print("Roughly about 95% of the predictions are between -" + str(np.round(rmse, 2)*2) + "cm and +" + str(np.round(rmse, 2)*2) + "cm of actual height")