Linear Regression - setiamanlhc/python-snippet-code GitHub Wiki

Create Dummy data

df_dummies = pd.get_dummies(df[cols_categorical_to_transform], drop_first=True)

df = pd.concat([df, df_dummies], axis=1)

Train Test Split

target = 'Height'
features = ['Father', 'Mother', 'Kids']

from sklearn import model_selection

x_train, x_test, y_train, y_test = model_selection.train_test_split(df[features], df[target], test_size=0.2, random_state=2020)

Import the specific model type

from sklearn import linear_model
from sklearn import metrics

Instantiate, Train, and Predict (3 Core Steps)

model = linear_model.LinearRegression()
model.fit(x_train, y_train)
predictions = model.predict(x_test)

Evaluate the model

rmse = np.sqrt(metrics.mean_squared_error(y_test, predictions))
print("RMSE = " + str(np.round(rmse, 2)) + "cm")
print("Roughly about 70% of the predictions are between -" + str(np.round(rmse, 2)) + "cm and +" + str(np.round(rmse, 2)) + "cm of actual height")
print("Roughly about 95% of the predictions are between -" + str(np.round(rmse, 2)*2) + "cm and +" + str(np.round(rmse, 2)*2) + "cm of actual height")