Logistic Regression - niranjv/ml-notes GitHub Wiki

Overview

  • Used when dependent variable (response) is categorical (usually 2 classes). Extensions include multinomial logistic regression for > 2 classes and ordered logistic regression for ordered responses, mixed logit, conditional random fields, conditional logistic regression, etc.
  • Used to estimate probability of binary response using 1 or more covariates
  • P(Y|X) is Bernoulli, not Gaussian.
  • Y_i are not identically distributed since P(Y_i|X_i) depends on the value of X_i. But Y_i are independent conditional on X_i and $\beta$
  • Predicted values are probabilities (in [0,1] due to the logistic function); threshold predicted probabilities to classify predictions into categories
  • Alternative to linear discriminant analysis.
  • Logit function is inverse of Logistic function. Logit/log odds of probability is equal to RHS of linear regression equation.

Estimation of Model Parameters

Model parameters must be estimated via iterative method, no closed form expression available like in linear regression. Failure of method to converge can occurs due to:

  • p >> n => conservative Wald statistic => non-convergence
  • Mulicollinearity => high std errors of model parameters
  • Sparseness in data => large number of empty cells => problematic for categorical data => no convergence because log(0) is undefined => collapse categories or add constant to all cells
  • Complete separation => all predictions are accurate => errors present

Goodness of Fit of Model

  • Use deviance to assess goodness of fit of model; analogous to R^2 in linear regression. Small values => less deviance from 'full' model
  • Pseudo R^2 - several measures available, each with its own limitations

Significance of model coefficients

  • Likelihood Ratio test
  • Wald statistic

Ref

⚠️ **GitHub.com Fallback** ⚠️