16. Adaptive MCMC - mach3-software/MaCh3 GitHub Wiki
Adaptive MCMC
Overview
Manually tuning MCMC is
- Tedious
- Hard
Thankfully, we can automate this process! Adaptive MCMC (AMCMC) "learns" the optimal step size required to efficiently explore the space. In the case of a Gaussian proposal function target a roughly Gaussian parameter this turns out to be the covariance of the posterior multiplied by a constant scaling factor of $2.38^{2}/n_{pars}$. Since we can't know the covariance of the posterior before running a fit, we instead need to use the covariance of the chain we're currently running. We can then update our proposal function every few steps and hope this converges to the "true" posterior covariance.
Troubleshooting
When does it Work?
Adaptive MCMC works really well when all your parameters are Gaussian! This is typically the case for ND-only fits and most cross-section/flux parameters. As a rule of thumb it's best to start throwing from your covariance matrix about 1000 - 10,000 steps into running your fit. You can see the effects of changing the adaption starting step as well as how well tuned the initial step sizes here:
MCMC traces for chains with manual tuning (top), adaption applied from the first step (middle) and adaption applied from step 10,000. The chain with adaption from 10,000 shows the best fit with the lowest auto-correlation and a trace that rapidly improves before stabilising.
In order to ensure a "valid" fit, adaption should be stopped at some point! You can pick this either totally arbitrarily or by looking the MCMC trace. If it seems like the trace has stopped improving for ~100,000 steps or so then you're safe to assume the throw matrix has more or less converged and you can run the fit with it switched off.
How Do I Know It's Worked?
As with all MCMC, the first step is to check your traces and auto-correlations. For AMCMC you expect the trace to get increasingly good as the chain goes on! Auto-correlations should be checked for a chain post-adaption since checking this with adapting steps can lead to some false-positive results simply because the chain is not ergodic. Additionally, for long Asimov chains, one can look at the posteriors and check that they seem sensible and converge to the correct values.
When Does This Fail?
So far it seems that AMCMC fails for multi-modal and cyclical parameters i.e. oscillations (and anything correlated with them like detector parameters). This can result in chains that converge to entirely incorrect results as in the following figure.
MCMC fit with adaption switched off for all parameters (blue), non-oscillation parameters (green) and all parameters (red). One can see that they converge to totally different values.
Adaptive MCMC can also run into problems with highly correlated parameters. This can be partially avoided by defining the highly correlated space as a separate "block". This separates your throw matrix into several independent block matrices with the correlated parts being separate to the less correlated parts. One example of this is with cross-section and flux parameters in T2K.
Additional Reading
For more information on adaptive MCMC in MaCh3: https://etheses.whiterose.ac.uk/id/eprint/35901/
For more information about adaptive MCMC in general: http://www.probability.ca/jeff/ftpdir/adaptex.pdf
Useful Video Lecture: https://www.youtube.com/watch?v=DwE2-YMQR5Y