Basic Maths - yadijustforfun/SP-transition GitHub Wiki

Markov Chain Monte Carlo (MCMC) & Bayesian Regression: How They Work Together

1. Bayesian Regression Overview

Bayesian regression is a statistical approach where model parameters are treated as random variables with prior distributions. Unlike classical linear regression (which gives point estimates), Bayesian regression provides a posterior distribution over the parameters, incorporating both prior beliefs and observed data.

Model Form:
[ y = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) ]
- Priors: (\beta_j \sim \text{Some Distribution (e.g., Normal)}), (\sigma \sim \text{Exponential()})
- Likelihood: (y \sim \mathcal{N}(X\beta, \sigma^2))
- Posterior: (P(\beta, \sigma | y, X) \propto P(y | \beta, \sigma, X) \cdot P(\beta, \sigma))

2. Why MCMC is Needed

The posterior distribution is often intractable (hard to compute analytically due to high-dimensional integrals). MCMC methods approximate the posterior by sampling from it.

3. How MCMC Works

MCMC constructs a Markov chain whose stationary distribution matches the posterior. Key algorithms:

Metropolis-Hastings: Proposes new parameter values and accepts/rejects them based on a probability ratio.
Gibbs Sampling: Samples each parameter sequentially from its conditional posterior.
Hamiltonian Monte Carlo (HMC): Uses gradient information for efficient exploration (used in Stan/PyMC3).

4. How They Work Together

Define the Bayesian Regression Model:
- Choose priors for (\beta) and (\sigma).
- Specify the likelihood (e.g., Gaussian for linear regression).
Run MCMC Sampling:
- Use MCMC to draw samples from (P(\beta, \sigma | \text{data})).
- Example: 10,000 samples for each parameter.
Analyze the Posterior:
- Compute posterior means, credible intervals, and predictive distributions.
- Visualize parameter distributions (e.g., using trace plots, density plots).

5. Example (PyMC3/Python)

import pymc3 as pm

# Bayesian Regression with MCMC
with pm.Model() as model:
    # Priors
    beta = pm.Normal("beta", mu=0, sd=10, shape=2)  # Coefficients
    sigma = pm.Exponential("sigma", lam=1)          # Noise
    
    # Likelihood
    mu = beta[0] + beta[1] * X
    y_obs = pm.Normal("y_obs", mu=mu, sd=sigma, observed=y)
    
    # MCMC Sampling
    trace = pm.sample(5000, tune=1000, chains=4)

6. Key Benefits

Uncertainty Quantification: Full posterior distributions for parameters.
Flexibility: Handles small data, hierarchical models, and complex priors.
Avoids Overfitting: Priors act as regularization.

7. Challenges

Computationally intensive for large datasets.
Requires convergence diagnostics (e.g., (\hat{R}), trace plots).

Summary

MCMC enables Bayesian regression by sampling from the posterior when analytical solutions are infeasible. Together, they provide a robust framework for probabilistic inference.

Certainly! Trace plots are a key diagnostic tool in MCMC to assess convergence and mixing of the Markov chains. Below are simulated examples of trace plots (with explanations) for a Bayesian linear regression model.

1. Good Trace Plot (Converged Chains)

A healthy trace plot shows:

Random noise (no trends or drifts).
Good mixing (chains overlap and explore the same region).
Stationarity (no long-term changes in mean/variance).

Example: Regression Coefficient ((\beta_1))

Iterations | Chain 1 | Chain 2 | Chain 3 | Chain 4
-----------|---------|---------|---------|--------
0          | 0.1     | -0.2    | 0.3     | -0.1
1000       | 0.5     | 0.4     | 0.6     | 0.5
2000       | 0.3     | 0.2     | 0.4     | 0.3
3000       | 0.4     | 0.5     | 0.5     | 0.4

Plot:
Good Trace Plot (Chains overlap, no trends)

2. Poor Trace Plot (Non-Converged Chains)

Signs of trouble:

Divergent chains (chains don’t overlap).
Trends (e.g., drifting mean).
Poor mixing (chains get "stuck").

Example: Divergent Chains for (\sigma) (Noise)

Iterations | Chain 1 | Chain 2 | Chain 3 | Chain 4
-----------|---------|---------|---------|--------
0          | 1.0     | 10.0    | 1.2     | 9.8
1000       | 1.1     | 9.5     | 1.3     | 10.2
2000       | 1.2     | 9.0     | 1.4     | 10.5
3000       | 1.3     | 8.5     | 1.5     | 10.8

Plot:
Divergent Trace Plot (Chains never agree)

3. Autocorrelation Plot (MCMC Efficiency)

High autocorrelation means slow mixing (samples are correlated).
Ideal: Low autocorrelation after a few lags.
Problem: High autocorrelation → Chains "stick" to past values.

Plot:
Autocorrelation Plot

How to Interpret Trace Plots in Practice

Check Convergence:
- Use pm.traceplot(trace) in PyMC3 or plot(trace) in Stan.
- Look for overlapping chains (R-hat < 1.01).
Fix Issues:
- Increase tune (burn-in) or target_accept in HMC.
- Reparameterize the model (e.g., centered vs. non-centered).
Validate:
- Posterior predictive checks (PPC).
- Compare prior/posterior to detect conflicts.

Key Takeaways

Good traces: Chains look like "hairy caterpillars" (random noise).
Bad traces: Chains diverge, drift, or get stuck.
Always visualize traces before trusting posterior summaries.