Basic Maths - yadijustforfun/SP-transition GitHub Wiki
Markov Chain Monte Carlo (MCMC) & Bayesian Regression: How They Work Together
1. Bayesian Regression Overview
Bayesian regression is a statistical approach where model parameters are treated as random variables with prior distributions. Unlike classical linear regression (which gives point estimates), Bayesian regression provides a posterior distribution over the parameters, incorporating both prior beliefs and observed data.
- Model Form:
[ y = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) ]- Priors: (\beta_j \sim \text{Some Distribution (e.g., Normal)}), (\sigma \sim \text{Exponential()})
- Likelihood: (y \sim \mathcal{N}(X\beta, \sigma^2))
- Posterior: (P(\beta, \sigma | y, X) \propto P(y | \beta, \sigma, X) \cdot P(\beta, \sigma))
2. Why MCMC is Needed
The posterior distribution is often intractable (hard to compute analytically due to high-dimensional integrals). MCMC methods approximate the posterior by sampling from it.
3. How MCMC Works
MCMC constructs a Markov chain whose stationary distribution matches the posterior. Key algorithms:
- Metropolis-Hastings: Proposes new parameter values and accepts/rejects them based on a probability ratio.
- Gibbs Sampling: Samples each parameter sequentially from its conditional posterior.
- Hamiltonian Monte Carlo (HMC): Uses gradient information for efficient exploration (used in Stan/PyMC3).
4. How They Work Together
- Define the Bayesian Regression Model:
- Choose priors for (\beta) and (\sigma).
- Specify the likelihood (e.g., Gaussian for linear regression).
- Run MCMC Sampling:
- Use MCMC to draw samples from (P(\beta, \sigma | \text{data})).
- Example: 10,000 samples for each parameter.
- Analyze the Posterior:
- Compute posterior means, credible intervals, and predictive distributions.
- Visualize parameter distributions (e.g., using trace plots, density plots).
5. Example (PyMC3/Python)
import pymc3 as pm
# Bayesian Regression with MCMC
with pm.Model() as model:
# Priors
beta = pm.Normal("beta", mu=0, sd=10, shape=2) # Coefficients
sigma = pm.Exponential("sigma", lam=1) # Noise
# Likelihood
mu = beta[0] + beta[1] * X
y_obs = pm.Normal("y_obs", mu=mu, sd=sigma, observed=y)
# MCMC Sampling
trace = pm.sample(5000, tune=1000, chains=4)
6. Key Benefits
- Uncertainty Quantification: Full posterior distributions for parameters.
- Flexibility: Handles small data, hierarchical models, and complex priors.
- Avoids Overfitting: Priors act as regularization.
7. Challenges
- Computationally intensive for large datasets.
- Requires convergence diagnostics (e.g., (\hat{R}), trace plots).
Summary
MCMC enables Bayesian regression by sampling from the posterior when analytical solutions are infeasible. Together, they provide a robust framework for probabilistic inference.
Certainly! Trace plots are a key diagnostic tool in MCMC to assess convergence and mixing of the Markov chains. Below are simulated examples of trace plots (with explanations) for a Bayesian linear regression model.
1. Good Trace Plot (Converged Chains)
A healthy trace plot shows:
- Random noise (no trends or drifts).
- Good mixing (chains overlap and explore the same region).
- Stationarity (no long-term changes in mean/variance).
Example: Regression Coefficient ((\beta_1))
Iterations | Chain 1 | Chain 2 | Chain 3 | Chain 4
-----------|---------|---------|---------|--------
0 | 0.1 | -0.2 | 0.3 | -0.1
1000 | 0.5 | 0.4 | 0.6 | 0.5
2000 | 0.3 | 0.2 | 0.4 | 0.3
3000 | 0.4 | 0.5 | 0.5 | 0.4
Plot:
(Chains overlap, no trends)
2. Poor Trace Plot (Non-Converged Chains)
Signs of trouble:
- Divergent chains (chains donโt overlap).
- Trends (e.g., drifting mean).
- Poor mixing (chains get "stuck").
Example: Divergent Chains for (\sigma) (Noise)
Iterations | Chain 1 | Chain 2 | Chain 3 | Chain 4
-----------|---------|---------|---------|--------
0 | 1.0 | 10.0 | 1.2 | 9.8
1000 | 1.1 | 9.5 | 1.3 | 10.2
2000 | 1.2 | 9.0 | 1.4 | 10.5
3000 | 1.3 | 8.5 | 1.5 | 10.8
Plot:
(Chains never agree)
3. Autocorrelation Plot (MCMC Efficiency)
High autocorrelation means slow mixing (samples are correlated).
Ideal: Low autocorrelation after a few lags.
Problem: High autocorrelation โ Chains "stick" to past values.
Plot:
How to Interpret Trace Plots in Practice
- Check Convergence:
- Use
pm.traceplot(trace)
in PyMC3 orplot(trace)
in Stan. - Look for overlapping chains (R-hat < 1.01).
- Use
- Fix Issues:
- Increase
tune
(burn-in) ortarget_accept
in HMC. - Reparameterize the model (e.g., centered vs. non-centered).
- Increase
- Validate:
- Posterior predictive checks (PPC).
- Compare prior/posterior to detect conflicts.
Key Takeaways
- Good traces: Chains look like "hairy caterpillars" (random noise).
- Bad traces: Chains diverge, drift, or get stuck.
- Always visualize traces before trusting posterior summaries.