How To: Uncertainties (for VBF H(inv) analysis) - alpakpinar/bucoffea Wiki

In general, uncertainties are computed within the analysis processors by doing the following: Update the event weights and/or event selection, to represent the uncertainty source, and compute the impact on the shape of the fitting variable.

This section is meant to be a guide to how to compute several important sources of uncertainties using bucoffea. For the time being, the instructions are limited to the VBF H(inv) analysis.

Jet Energy Uncertainties

Jet energy uncertainties represent the uncertainties for the jet energy corrections and the smearing factors we apply to the jets. In practice, the uncertainties due to these are computed by varying the jet and MET momenta up and down (due to each uncertainty source), and computing the impact on the shape of the fitting variable.

Computing Jet Energy Uncertainties

The implementation for jet energy uncertainties are found within 15Apr21_ReReco_UL_JES branch. So if you're doing this computation, please check out this branch and pull the latest code.

Within this branch, the vbfhinvProcessor object is updated so that instead of running over all regions only once (the nominal case), it runs multiple times for each variation. The list of variations are specified here. The code looks like this:

# jerUp and jerDown -> Up/down variations of jet energy resolution (JER)
# jesTotalUp and jesTotalDown -> Up/down variations of jet energy scale (JES)
self._variations = [
            '', # This is the nominal case
            '_jerUp', 
            '_jerDown',
            '_jesTotalUp', 
            '_jesTotalDown',
        ]

Note that the list of variations can be changed in a flexible way, due to this design (e.g. if we want to split the JES uncertainties).

This way, when the VBF H(inv) processor is executed, it will define separate regions for each variation, and will run over all regions. For example, for signal region, the processor will now run over:

  • The nominal signal region
  • The signal region with JER up/down variations
  • The signal region with JES up/down variations

This way, each histogram will have all the variations saved, and they can be used to compute the variations with respect to the nominal case (next section). Once the variations are specified as above, the user can simply run the vbfhinv processor as usual, as explained here.

Plotting the Jet Energy Uncertainties

Once the processor is ran and the outputs are merged into one accumulator (see here), a plotting script can be run to plot the uncertainties for a given dataset. This plotting script is located on this path: plot/studies/vbf_uncertainties/jet_energy and it is called plot_jes_variations.py. It does two things:

  • Plots the variations and the ratio of variations to nominal case
  • Saves the ratios in an output ROOT file (to be used later in the fit)

With the merged accumulator ready, the user can simply execute:

# Go to the directory
cd plot/studies/vbf_uncertainties/
./plot_jes_variations.py /path/to/the/merged/accumulator

Other Experimental Uncertainties

Most other experimental uncertainties (including variations in prefire SF, pileup SF etc.) are built in to the VBF H(inv) processor class. All the user needs to do is to configure which uncertainties to run from the configuration files, using the fields here. The syntax for each uncertainty looks like this:

uncertainties:
  prefire_sf: True
  btag_sf: True
  ...

Each flag specifies one of the uncertainty sources, setting a flag True will run the uncertainty source, and save the results of up and down variations into the cnn_score_unc variable, corresponding to the uncertainties on the score distribution. See here for an example implementation in the vbfhinvProcessor class.

Plotting the Uncertainties

Once the merged accumulator with the saved uncertainties are produced, these uncertainties can be plotted via the plot_uncertainties.py script, located here. The usage of this script is simple, it has a dataset and an uncertainties variable, keeping track of the dataset, and the list of uncertainties to plot, and save into a ROOT file for later use in the fit. The implementation can be found here. This script can be executed by pointing it to the merged accumulator input:

./plot_uncertainties.py /path/to/the/merged/input/accumulator

This will create plots and ROOT files per uncertainty source (as specified in uncertainties), under the output directory. Note that by default, this script will plot and save the uncertainties as a function of the CNN score, i.e. it will use the cnn_score_unc histogram. If you want to plot the variations as a function of mjj instead, you can specify a -v (or --variable) argument to the script:

# Plot and save the uncertainties as a function of mjj
./plot_uncertainties.py /path/to/the/merged/input/accumulator -v mjj

Note that supported options for -v argument are only mjj and cnn_score at the moment.

Theory Uncertainties

Theory uncertainties, such as the scale and PDF uncertainties, are also computed within the vbfhinvProcessor class (see here). The set of uncertainties and the ROOT histograms containing each variation is stored in vbfhinv.yaml file, starting from here. The variations of Z(vv)/W(lv) and gamma/Z(vv) ratios are computed as a function of generator-level boson pt, these variations in weights are applied, and variations in the mjj, cnn_score and dnn_score distributions are saved.

Saving the Theory Uncertainties into a ROOT file

Given the merged accumulator, after running the VBF H(inv) processor and merging the output .coffea files, the theory uncertainties on V+jets transfer factors can be easily saved into a ROOT file. This ROOT file will also be later used in the fit framework, to save the uncertainty shapes to combine workspace. To be precise, four uncertainties on the transfer factors are computed here:

  • Renormalization scale (mu_R)
  • Factorization scale (mu_F)
  • PDF uncertainty
  • NLO EWK correction uncertainty

These uncertainties on V+jets transfer factors can be saved by using the make_wz_uncertainties.py script, located under plot/studies/theory_uncertainties. Just point this script to the location of the merged .coffea files, together with a few additional command line arguments, specifying the variable to save the uncertainties as a function of (default is cnn_score), and the years to run on (default is to run both 2017 and 2018).

The script can be used as follows:

# Run 2017 only
./make_wz_uncertainties.py /path/to/merged/acc -v cnn_score -y 2017

# Default is to run for 2017 and 2018 (will fail if you're missing data!)
./make_wz_uncertainties.py /path/to/merged/acc -v cnn_score

At the end, this script should output three ROOT files, with the uncertainties saved in the vbf_z_w_gjets_theory_unc_ratio_unc.root file (which will be used in the fit). You can read the next sub-section for plotting the uncertainties on these ratios.

Plotting the Theory Uncertainties

Once the ROOT file with the uncertainties is produced from the previous step, plotting them is easy! In the same directory, you can use the plot_uncertainties.py script, and point it to the vbf_z_w_gjets_theory_unc_ratio_unc.root file produced earlier. Similar to the other script, this takes a -y (or --years) argument which will specify the years to run on.

The script can be executed as follows:

./plot_uncertainties.py /path/to/root/file

The script should output the plots of uncertainties as PDF files in the same directory as the ROOT file, under the plots sub-directory.