Logistic Regression - niranjv/ml-notes GitHub Wiki

Overview

Used when dependent variable (response) is categorical (usually 2 classes). Extensions include multinomial logistic regression for > 2 classes and ordered logistic regression for ordered responses, mixed logit, conditional random fields, conditional logistic regression, etc.
Used to estimate probability of binary response using 1 or more covariates
P(Y|X) is Bernoulli, not Gaussian.
Y_i are not identically distributed since P(Y_i|X_i) depends on the value of X_i. But Y_i are independent conditional on X_i and $\beta$
Predicted values are probabilities (in [0,1] due to the logistic function); threshold predicted probabilities to classify predictions into categories
Alternative to linear discriminant analysis.
Logit function is inverse of Logistic function. Logit/log odds of probability is equal to RHS of linear regression equation.

Estimation of Model Parameters

Model parameters must be estimated via iterative method, no closed form expression available like in linear regression. Failure of method to converge can occurs due to:

p >> n => conservative Wald statistic => non-convergence
Mulicollinearity => high std errors of model parameters
Sparseness in data => large number of empty cells => problematic for categorical data => no convergence because log(0) is undefined => collapse categories or add constant to all cells
Complete separation => all predictions are accurate => errors present

Goodness of Fit of Model

Use deviance to assess goodness of fit of model; analogous to R^2 in linear regression. Small values => less deviance from 'full' model
Pseudo R^2 - several measures available, each with its own limitations

Significance of model coefficients

Likelihood Ratio test
Wald statistic

Ref

ISLR, Section 4.3
Wikipedia

Logistic Regression - niranjv/ml-notes GitHub Wiki

Overview

Estimation of Model Parameters

Goodness of Fit of Model

Significance of model coefficients

Ref

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️