【Chapter 1】Logistic Regressions - stephanie0324/MachineLearning GitHub Wiki

Overview

Logistic regression is a statistical model used to predict the probability of a **binary outcome.
Logistic regression is most frequently used method for modeling binary response data and binary classification.

Key Concept: Odds and Log-Odds

Odds

The odds in favor of a particular event ( y = 1 ) can be expressed as: $ \text{odds}(y = 1 | x) = \frac{p}{1 - p} $ where ( p ) represents the probability of the positive class (e.g., class 1). In other words, odds is the ratio of the probability of an event occurring to the probability of it not occurring.

Log-Odds (Logit)

The logit function is the natural logarithm of the odds: $ \text{logit}(p) = \log \left( \frac{p}{1 - p} \right) $ The logit function takes values in the range (0, 1) and transforms them into values over the entire real number range (-∞, ∞).

In logistic regression, we assume a linear relationship between the weighted inputs (referred to as net inputs) and the log-odds: $ \text{logit}(p) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k$ This linear function relates the log-odds of the probability to the features ( x_1, x_2, \dots, x_k ) through the weights ( \beta_0, \beta_1, \dots, \beta_k ).

However, the ultimate goal is to predict the probability ( p ), not just the log-odds. The inverse logit function maps the log-odds back to the probability ( p ) between 0 and 1.

Sigmoid Function (Logistic Function)

The inverse of the logit function is the logistic sigmoid function, which has the following form: $ p = \frac{1}{1 + e^{-z}} $ where ( z = \beta_0 + \beta_1 x_1 + \dots + \beta_k x_k ) is the net input (a linear combination of the features).

The sigmoid function has an S-shape and maps values from ( -\infty ) to ( +\infty ) onto the interval ( (0, 1) ), making it a perfect choice for modeling probabilities.

Final Thoughts

Logistic regression is a simple yet powerful model for binary classification. It’s based on a probabilistic framework that allows us to estimate the probability of an event, and it uses the logistic sigmoid function to map the linear output into a valid probability. Logistic regression can also be extended to multiclass classification through techniques such as multinomial logistic regression or One-vs-Rest (OvR).