Logistic Regression - rafi966/Machine-Learning GitHub Wiki

In the Linear regression module, you explored how to construct a model to make continuous numerical predictions, such as the fuel efficiency of a car. But what if you want to build a model to answer questions like "Will it rain today?" or "Is this email spam?" This module introduces a new type of regression model called logistic regression that is designed to predict the probability of a given outcome.

In the Linear regression module, you learned how to create a model for predicting continuous numerical values, like how efficiently a car uses fuel. Now, if you want to predict outcomes that are yes/no or true/false, such as "Will it rain today?" or "Is this email spam?", you can use a different kind of model called logistic regression. This type of regression is specifically designed to estimate the probability of a particular event occurring.

introduction

Logistic regression: Calculating a probability with the sigmoid function

Many problems require a probability estimate as output. Logistic regression is an extremely efficient mechanism for calculating probabilities. Practically speaking, you can use the returned probability in either of the following two ways:

  • Applied "as is." For example, if a spam-prediction model takes an email as input and outputs a value of 0.932, this implies a 93.2% probability that the email is spam.

  • Converted to a binary category such as True or False, Spam or Not Spam.

This module focuses on using logistic regression model output as-is. In the Classification module, you'll learn how to convert this output into a binary category.

Sigmoid function You might be wondering how a logistic regression model can ensure its output represents a probability, always outputting a value between 0 and 1. As it happens, there's a family of functions called logistic functions whose output has those same characteristics. The standard logistic function, also known as the sigmoid function (sigmoid means "s-shaped"), has the formula: image Figure 1. Graph of the sigmoid function. The curve approaches 0 as x values decrease to negative infinity, and 1 as x values increase toward infinity. As the input, x, increases, the output of the sigmoid function approaches but never reaches 1. Similarly, as the input decreases, the sigmoid function's output approaches but never reaches 0.

The table below shows the output values of the sigmoid function for input values in the range –7 to 7. Note how quickly the sigmoid approaches 0 for decreasing negative input values, and how quickly the sigmoid approaches 1 for increasing positive input values.

However, no matter how large or how small the input value, the output will always be greater than 0 and less than 1.

image image learn more about log-odds image Figure 2. Left: graph of the linear function z = 2x + 5, with three points highlighted. Right: Sigmoid curve with the same three points highlighted after being transformed by the sigmoid function. In Figure 2, a linear equation becomes input to the sigmoid function, which bends the straight line into an s-shape. Notice that the linear equation can output very big or very small values of z, but the output of the sigmoid function, y', is always between 0 and 1, exclusive. For example, the yellow square on the left graph has a z value of –10, but the sigmoid function in the right graph maps that –10 into a y' value of 0.00004.

Logistic regression: Loss and regularization

bookmark_border

Logistic regression models are trained using the same process as linear regression models, with two key distinctions:

Logistic regression models use Log Loss as the loss function instead of squared loss. Applying regularization is critical to prevent overfitting. The following sections discuss these two considerations in more depth.

Log Loss

image image Regularization in logistic regression Regularization, a mechanism for penalizing model complexity during training, is extremely important in logistic regression modeling. Without regularization, the asymptotic nature of logistic regression would keep driving loss towards 0 in cases where the model has a large number of features. Consequently, most logistic regression models use one of the following two strategies to decrease model complexity: