01 Generalied Linear Model - chanchishing/Generalized-Linear-Models-and-Nonparametric-Regression GitHub Wiki
Generalized Linear Model (GLM)
A Generalized Linear Model (GLM) is make up of :
-
a random response component belongs to the Exponential Family Distribution deonted as $Y$
-
a systematic component consists of a linear combination of discrete or continuous predictors, and fixed parameters. The systematic component, $\eta$, is represented as
$$ \begin{alignat*}{4} && \eta &= \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p&&\ && &\text{where} &&\ && &\text{we have $p$ number of predictors} &&\ && &\text{the $x s$ are the predictors of the responses} &&\ && &\text{the $\beta s$ are the linear parameters of the GLM} &&\ \end{alignat*} $$
- a link function connects the random response to the linear combination of predictor variables.
In GLM we think that we have some linear combination of predictors and parameter values and we think that that linear combination can explain variability in the random response component, it's just that we might need to take some link function of this linear combination in order to properly relate it to the response.
Exponential Family Distribution
A random variable $Y$ belongs to the exponential family of distribution if its PDF (for continuous distribution) or PMF (discrete distribution) can be written in the following form:
$$ \begin{alignat*}{4} &&f(y; \theta,\phi ) &= \exp \left\{ \frac{y\ \theta - b(\theta)} {a(\phi)} + c(y, \phi)\right\} &&\ && &\text{where} &&\ && &\text{$y$ is the response variable } &&\ && &\text{$\theta$ is called the canonical (natural) parameter of the distribution which represent the location of the distribution} &&\ && &\text{$\phi$ is called the dispersion parameter which represent the scale of the distribution} &&\ && &\text{$a(\phi)$ is some function that depends on $\phi$ and other variables but it does not depend on $\theta$ and $y$} &&\ && &\text{$b(\theta)$ is some function that depends on $\theta$ and other variables but it does not depend on $\phi$ and $y$} &&\ && &\text{$c(y,\phi)$ is some function that depends on $y$, $\phi$ and other variables but it does not depend on $\theta$} &&\ \end{alignat*} $$
Example: Binomial Distribution is an Exponential Family Distribution
$$ \begin{alignat*}{4} &&P(Y=y;n,p) &= {n \choose y}p^y(1-p)^{n-y} && \text{PMF of Binomial Distribution}\ && &= exp\left\{ln\left\{{n \choose y}p^y(1-p)^{n-y} \right\}\right\} && \text{Take log and then apply $exp()$ on PMF,}\ && & && \text{$\because exp(ln(x))=x$, nothing changed}\ && &= exp\left\{ln{n \choose y} + y\ ln(p) + (n-y)\ ln(1-p) \right\} && \ && &= exp\left\{ln{n \choose y} + y\ ln(p) - y\ ln(1-p) + n\ ln(1-p) \right\} && \ && &= exp\left\{ln{n \choose y} + y\ ln(\frac {p} {1-p}) + n\ ln(1-p) \right\} && \ && &= exp\left\{y\ ln(\frac {p} {1-p}) + n\ ln(1-p) + ln{n \choose y} \right\} && \ && &= exp\left\{y\ \theta + n\ ln(1-p) + ln{n \choose y} \right\} && \text{Let $\theta=ln(\frac {p} {1-p})$}\ && &= exp\left\{y\ \theta + n\ ln(1-\frac{e^\theta}{1+e^\theta}) + ln{n \choose y} \right\} && \text{$\because \theta=ln(\frac {p} {1-p}) \Rightarrow e^\theta = \frac{p}{1-p}$}\ && & && \text{$\ \Rightarrow p=\frac{e^\theta}{1+e^\theta}$}\ && &= exp\left\{y\ \theta + n\ ln(\frac{1+e^\theta}{1+e^\theta}-\frac{e^\theta}{1+e^\theta}) + ln{n \choose y} \right\} && \ && &= exp\left\{y\ \theta + n\ ln(\frac{1}{1+e^\theta}) + ln{n \choose y} \right\} && \ && &= exp\left\{y\ \theta - n\ ln(1+e^\theta) + ln{n \choose y} \right\} && \ && &= exp\left\{y\ \theta - b(\theta) + ln{n \choose y} \right\} && \text{Let $b(\theta)=n \ ln(1+e^\theta)$, note it does not depend on $\phi$}\ && &= exp\left\{\frac{y\ \theta - b(\theta)} {a(\phi)} + ln{n \choose y} \right\} && \text{Let $a(\phi)=1 $, note it does not depend on $\theta$}\ && &= exp\left\{\frac{y\ \theta - b(\theta)} {a(\phi)} + c(y,\phi) \right\} && \text{Let $c(y,\phi)= ln{n \choose y}$, note it does not depend on $\theta$}\ \end{alignat*} $$
GLM Deviance
Concept | Linear Regression (Meaning) | Formula | GLM (Meaning) | Formula |
---|---|---|---|---|
Total variation | Total variation in $y_i$ vs. mean | $\text{TSS} = \sum (y_i - \bar{y})^2$ | Fit of intercept-only model | $D_{\text{null}} = -2 \left[ \log L(\text{null}) - \log L(\text{saturated}) \right]$ |
Model error | Unexplained variance | $\text{RSS} = \sum (y_i - \hat{y}_i)^2$ | Lack of fit of full model | $D_{\text{resid}} = -2 \left[ \log L(\text{fitted}) - \log L(\text{saturated}) \right]$ |
Model gain | Improvement due to predictors | $\text{ESS} = \text{TSS} - \text{RSS}$ | Deviance reduction | $D_{\text{null}} - D_{\text{resid}}$ |
Proportion fit | Proportion of variance explained | $R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}$ | Proportion of deviance explained | $R^2_{\text{pseudo}} = 1 - \frac{D_{\text{resid}}}{D_{\text{null}}}$ |