Basic Maths - yadijustforfun/SP-transition GitHub Wiki

Markov Chain Monte Carlo (MCMC) & Bayesian Regression: How They Work Together

1. Bayesian Regression Overview

Bayesian regression is a statistical approach where model parameters are treated as random variables with prior distributions. Unlike classical linear regression (which gives point estimates), Bayesian regression provides a posterior distribution over the parameters, incorporating both prior beliefs and observed data.

  • Model Form:
    [ y = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) ]
    • Priors: (\beta_j \sim \text{Some Distribution (e.g., Normal)}), (\sigma \sim \text{Exponential()})
    • Likelihood: (y \sim \mathcal{N}(X\beta, \sigma^2))
    • Posterior: (P(\beta, \sigma | y, X) \propto P(y | \beta, \sigma, X) \cdot P(\beta, \sigma))

2. Why MCMC is Needed

The posterior distribution is often intractable (hard to compute analytically due to high-dimensional integrals). MCMC methods approximate the posterior by sampling from it.

3. How MCMC Works

MCMC constructs a Markov chain whose stationary distribution matches the posterior. Key algorithms:

  • Metropolis-Hastings: Proposes new parameter values and accepts/rejects them based on a probability ratio.
  • Gibbs Sampling: Samples each parameter sequentially from its conditional posterior.
  • Hamiltonian Monte Carlo (HMC): Uses gradient information for efficient exploration (used in Stan/PyMC3).

4. How They Work Together

  1. Define the Bayesian Regression Model:
    • Choose priors for (\beta) and (\sigma).
    • Specify the likelihood (e.g., Gaussian for linear regression).
  2. Run MCMC Sampling:
    • Use MCMC to draw samples from (P(\beta, \sigma | \text{data})).
    • Example: 10,000 samples for each parameter.
  3. Analyze the Posterior:
    • Compute posterior means, credible intervals, and predictive distributions.
    • Visualize parameter distributions (e.g., using trace plots, density plots).

5. Example (PyMC3/Python)

import pymc3 as pm

# Bayesian Regression with MCMC
with pm.Model() as model:
    # Priors
    beta = pm.Normal("beta", mu=0, sd=10, shape=2)  # Coefficients
    sigma = pm.Exponential("sigma", lam=1)          # Noise
    
    # Likelihood
    mu = beta[0] + beta[1] * X
    y_obs = pm.Normal("y_obs", mu=mu, sd=sigma, observed=y)
    
    # MCMC Sampling
    trace = pm.sample(5000, tune=1000, chains=4)

6. Key Benefits

  • Uncertainty Quantification: Full posterior distributions for parameters.
  • Flexibility: Handles small data, hierarchical models, and complex priors.
  • Avoids Overfitting: Priors act as regularization.

7. Challenges

  • Computationally intensive for large datasets.
  • Requires convergence diagnostics (e.g., (\hat{R}), trace plots).

Summary

MCMC enables Bayesian regression by sampling from the posterior when analytical solutions are infeasible. Together, they provide a robust framework for probabilistic inference.

Certainly! Trace plots are a key diagnostic tool in MCMC to assess convergence and mixing of the Markov chains. Below are simulated examples of trace plots (with explanations) for a Bayesian linear regression model.


1. Good Trace Plot (Converged Chains)

A healthy trace plot shows:

  • Random noise (no trends or drifts).
  • Good mixing (chains overlap and explore the same region).
  • Stationarity (no long-term changes in mean/variance).

Example: Regression Coefficient ((\beta_1))

Iterations | Chain 1 | Chain 2 | Chain 3 | Chain 4
-----------|---------|---------|---------|--------
0          | 0.1     | -0.2    | 0.3     | -0.1
1000       | 0.5     | 0.4     | 0.6     | 0.5
2000       | 0.3     | 0.2     | 0.4     | 0.3
3000       | 0.4     | 0.5     | 0.5     | 0.4

Plot:
Good Trace Plot (Chains overlap, no trends)


2. Poor Trace Plot (Non-Converged Chains)

Signs of trouble:

  • Divergent chains (chains donโ€™t overlap).
  • Trends (e.g., drifting mean).
  • Poor mixing (chains get "stuck").

Example: Divergent Chains for (\sigma) (Noise)

Iterations | Chain 1 | Chain 2 | Chain 3 | Chain 4
-----------|---------|---------|---------|--------
0          | 1.0     | 10.0    | 1.2     | 9.8
1000       | 1.1     | 9.5     | 1.3     | 10.2
2000       | 1.2     | 9.0     | 1.4     | 10.5
3000       | 1.3     | 8.5     | 1.5     | 10.8

Plot:
Divergent Trace Plot (Chains never agree)


3. Autocorrelation Plot (MCMC Efficiency)

High autocorrelation means slow mixing (samples are correlated).
Ideal: Low autocorrelation after a few lags.
Problem: High autocorrelation โ†’ Chains "stick" to past values.

Plot:
Autocorrelation Plot


How to Interpret Trace Plots in Practice

  1. Check Convergence:
    • Use pm.traceplot(trace) in PyMC3 or plot(trace) in Stan.
    • Look for overlapping chains (R-hat < 1.01).
  2. Fix Issues:
    • Increase tune (burn-in) or target_accept in HMC.
    • Reparameterize the model (e.g., centered vs. non-centered).
  3. Validate:
    • Posterior predictive checks (PPC).
    • Compare prior/posterior to detect conflicts.

Key Takeaways

  • Good traces: Chains look like "hairy caterpillars" (random noise).
  • Bad traces: Chains diverge, drift, or get stuck.
  • Always visualize traces before trusting posterior summaries.