Ordinary Least Square - Nori12/Machine-Learning-Tutorial GitHub Wiki

Machine Learning Tutorial

Overview

Ordinary Least Square (OLS) is one of many ways in which we can implement a Linear Regression algorithm. It is the simplest one. In this page, we are going to understand how it can be calculated with an intuitive example.

Let us take a simple dataset to explain the linear regression model through OLS.

Dataset

Take our example dataset. The “Years of Experience” columns are independent variables and the “Salary in 1000$” column values are dependent variables.

OLS_dataset

OLS_graph

Sum of Squared Errors (SSE)

In order to fit the best intercept line between the points in the above scatter plots, we use a metric called “Sum of Squared Errors” (SSE) and compare the lines to find out the best fit by reducing errors. The errors are sum difference between actual value and predicted value. To find the errors for each dependent value, we need to use the formula below.

linear_regression_SSE

The sum of squared errors SSE output is 5226.19. To do the best fit of line intercept, we need to apply a linear regression model to reduce the SSE value at minimum as possible. To identify a slope intercept, we use the equation:

y = mx + b

where m is the slope, x is independent variables and b is the intercept.

To use OLS method, we apply the below formula to find the equation.

OLS_formula

We need to calculate slope ‘m’ and line intercept ‘b’. Below is the simpler table to calculate those values.

OLS_calculations

m = 1037.8 / 216.19

m = 4.80

b = 45.44 - 4.80 * 7.56 = 9.15

Hence, y = mx + b → 4.80x + 9.15

y = 4.80x + 9.15

Comparing our OLS method result with MS-Excel, we can see that they are pretty much the same.

Our OLS method output → y = 4.80x + 9.15

MS-Excel Linear Reg. Output → y = 4.79x + 9.18