R^2 Score - Nori12/Machine-Learning-Tutorial GitHub Wiki

Machine Learning Tutorial

R^2 Score

Also known as the Coefficient of Determination, it is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). There are several definitions of R2 that are only sometimes equivalent.

When an intercept is included, then r^2 is simply the square of the sample correlation coefficient (i.e., r) between the observed outcomes and the observed predictor values. If additional regressors are included, R^2 is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination normally ranges from 0 to 1. A value of 1 corresponds to a perfect prediction, and a value of 0 corresponds to a constant model that just predicts the mean of the training set responses.

OBS: The intercept (often labeled the constant) is the expected mean value of Y when all X=0.

There are cases where the computational definition of R2 can yield negative values, depending on the definition used. This can arise when the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data. Even if a model-fitting procedure has been used, R2 may still be negative, for example when linear regression is conducted without including an intercept, or when a non-linear function is used to fit the data. In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion.

We can also evaluate the model using the score method, which for regressors returns the R2 score.

print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))