11. Step size tuning - mach3-software/MaCh3 GitHub Wiki

What is and why you need to know what step size tunning

It is good practice to run an MCMC diagnostic before looking at posterior distributions, although in most cases you will stumble on the need to diagnose after checking posteriors. If your posterior looks like this either you do have not enough steps or wrongly tune the chain.

Before discussing step size tunning first we need to understand how a step is proposed.

  • Gaussian throw - random number drawn from Gauss distribution starting at the previous step multiplied with a spread equal to parameter error.
  • Correlated throw - the proposed step for two correlated parameters should be more likely to change direction in the same way, hence we include correlation using the Cholesky matrix.
  • Individual step scale - user selected value for each parameter by a default =1, by step size tunning in most cases we mean modifying this value.
  • Global step scale - user selected value same cor all parameters in a given covariance class. For example, the value is the same for all xsec-based parameters.

MCMC diagnostic

There are several plots worth studying for MCMC diagnostic you can find the executables which produce them in the Diagnostics folder, while the text is based on [1].

Autocorealtions

we can study chain autocorrelation, which tells us how much particular steps are correlated. To test it, we introduce a variable Lag(n)= corr(Xi, Xi−n) which tells us how much correlated are steps that are n steps apart, where i is the maximal considered distanced, here i = 25000. Fig. shows autocorrelations for studied chains since we want our steps to be more random, and less correlated to quickly converge. The rule of thumb is for autocorrelation to drop below 0.2 for Lag(n = 10000). This isn’t a strict criterion so if sometimes autocorrelations drop slightly slower than the blue line in our Figure it’s not a problem.

Example of well-tuned step scale (colours represent chains which have different starting positions)

If your autocorrelation looks like this though you really should increase step size. One exception would be parameter which has no effect. Imagine you run ND only fit but the parameter affects FD only. Then it is expected autocorrelation to look badly.

Trace

Fig. shows the trace which is the value of a chosen parameter at each step. It can be seen that at first, the chains have different traces but after a thousand steps, they start to stabilise and oscillates around a very similar value, indicating that the chain converged and a stationary state was achieved.

AcceptanceProbability Batch

Fig. shows a mean value of acceptance probability (A(θ',θ)) in an interval of 5k steps (batched mean). This quantity is quite high at the beginning indicating the chain didn’t converge. When the chain gets close to the stationary state it starts to stabilise. Orange stabilised fastest while blue and green are slowly catching up, but the red didn’t converge yet.

R Hat

Usually, we run several chains which are later combined. There is a danger that not all chains will converge then using them will bias results. R hat is meant to estimate whether chains converged successfully or not. According to Gelman, you should calculate R hat for at least 4 chains and if R hat > 1.1 then it might indicate wrong chains convergence. Below you can find an example of chains which wrongly converged and one which successfully

Chains covnerged to different values

Successfuly converged chains

Geweke

Geweke Diagnostic helps to define what burn it should it be. You should select burn-in around 15% in this case as this is where distirbution stabilizes.

Global Step Scale

According to [2] (Section 3.2.1), the global step size should be equal to 2.38^2/N_params. Keep in mind this refers to the global step scale for a given covariance object like xsec covariance or detector covariance.

Manual Tunning

This procedure is very tedious and requires intuition of how a given parameter behaves. It is a bit of dark magic however skilled users should be able to tune it relatively fast compared with the non-skilled user. The process is as follows, you run the chain, run the diagnostic executable look at plots adjust the step scale then run again fit and the process repeats. Each time you should look at autocorrelations, traces, etc. (see discussion above). Another important trick is not to run full fit. Instead of running 10M chain, you might run 200k. Number of steps depends on number of parameters and Lag(n = ?) you are interested in.

There are a few things you should be aware of when tunning:

  • Parameters with a broad range may have a higher step scale, while those with a narrow should have smaller ones to reduce the probability of going out of bounds.
  • Highly correlated should have a similiar step-scale, for edge cases like ~100% step scale should be identical!
  • Autocorelations should drop below 0.2 for Lag(n = 10000). If it drops immediately then the step scale is too big.
  • Study trace, if is converging, exploring phase space fast enough. Exploring too fast is wrong.
  • Study acceptance probability. If every step is accepted then the scale is too small, while if barely any step is getting accepted you might consider decreasing the step scale.
  • Doing LLH scan and assigning a step scale based on such a result is also a good idea.

The last point is that data fit may require a different tuning than the Asimov fit. Still, if you tune for Asimov it should be easy to re-tune for a data fit.

Manually Tuning Individual Step Scales in MaCh3

Currently MaCh3 configures systematics in two ways. The first is simply through YAML configs which are used for cross-section systematics. Changing the step scale of parameter in a YAML config is simply a case of modifying the "StepScale" option for the parameter in the YAML file.

Some systematics, for example oscillation parameters, still use an XML->ROOT pipeline. Here the systematics are initially defined in an XML file first, for example in DUNE-MaCh3 they are located here. Individual parameter step scales can then be modified through the stepscale option in the XML (although this may vary between systematics as it's a legacy system!). Finally, these can then be converted into a ROOT file with a suitable script, in the DUNE example this is located here.

Adaptive MCMC

Hopefully by this point you've realised that step size tuning is

  1. Hard
  2. Tedious

Thankfully we can automate it! It turns out that, provided you Markov chain satisfies the Markov chain central limit theorem [2] it's optimal to propose steps from the posterior covariance matrix (multiplied by a scale factor).

The config then has the following options which let you tune this

['AdaptionOptions']
['Settings']
AdaptionStartThrow   : [int]  At what step do we start throwing from the posterior matrix
AdaptionEndUpdate    : [int]  At what step do we fix the matrix we're throwing from
AdaptionUpdateStep   : [int]  How frequently do we update our throw matrix->posterior covariance
AdaptionStartUpdate  : [int]  At what step do we start updating our posterior covariance?

[CovarianceSettings.'Matrix Name'] ('Matrix Name'={'Xsec', 'Osc',...})
    DoAdaption       : [bool] Are we doing adaption?
    MatrixBlocks     : List[List[int]]  Indices for blocks that adapt together 
  
    # These are only relevant if you want to throw from an external matrix
    UseExternalMatrix  : [bool] Do you want to use a matrix from an external file?
    MatrixFileName     : [str]  Name of file with external matrix
    MatrixName         : [str]  Name of external matrix
    MeansName          : [str]  Name of external means [only necessary if you plan to continue updating the matrix]

Adaptive step size tuning is still a little bit fiddly but works well if you have "Gaussian-ish" parameters [for example cross-section!]. Generally I'd recommend updating every few 1000 steps and stop updating after around 1,000,000 steps.

References

[1] Kamil Skwarczynski PhD thesis
[2] https://asp-eurasipjournals.springeropen.com/track/pdf/10.1186/s13634-020-00675-6.pdf

If you have complaints blame: Kamil Skwarczynski

⚠️ **GitHub.com Fallback** ⚠️