How To: Uncertainties (for VBF H(inv) analysis) - alpakpinar/bucoffea Wiki
In general, uncertainties are computed within the analysis processors by doing the following: Update the event weights and/or event selection, to represent the uncertainty source, and compute the impact on the shape of the fitting variable.
This section is meant to be a guide to how to compute several important sources of uncertainties using
bucoffea. For the time being, the instructions are limited to the VBF H(inv) analysis.
Jet Energy Uncertainties
Jet energy uncertainties represent the uncertainties for the jet energy corrections and the smearing factors we apply to the jets. In practice, the uncertainties due to these are computed by varying the jet and MET momenta up and down (due to each uncertainty source), and computing the impact on the shape of the fitting variable.
Computing Jet Energy Uncertainties
The implementation for jet energy uncertainties are found within
15Apr21_ReReco_UL_JES branch. So if you're doing this computation, please check out this branch and pull the latest code.
Within this branch, the
vbfhinvProcessor object is updated so that instead of running over all regions only once (the nominal case), it runs multiple times for each variation. The list of variations are specified here. The code looks like this:
# jerUp and jerDown -> Up/down variations of jet energy resolution (JER) # jesTotalUp and jesTotalDown -> Up/down variations of jet energy scale (JES) self._variations = [ '', # This is the nominal case '_jerUp', '_jerDown', '_jesTotalUp', '_jesTotalDown', ]
Note that the list of variations can be changed in a flexible way, due to this design (e.g. if we want to split the JES uncertainties).
This way, when the VBF H(inv) processor is executed, it will define separate regions for each variation, and will run over all regions. For example, for signal region, the processor will now run over:
- The nominal signal region
- The signal region with JER up/down variations
- The signal region with JES up/down variations
This way, each histogram will have all the variations saved, and they can be used to compute the variations with respect to the nominal case (next section). Once the variations are specified as above, the user can simply run the
vbfhinv processor as usual, as explained here.
Plotting the Jet Energy Uncertainties
Once the processor is ran and the outputs are merged into one accumulator (see here), a plotting script can be run to plot the uncertainties for a given dataset. This plotting script is located on this path:
plot/studies/vbf_uncertainties/jet_energy and it is called
plot_jes_variations.py. It does two things:
- Plots the variations and the ratio of variations to nominal case
- Saves the ratios in an output ROOT file (to be used later in the fit)
With the merged accumulator ready, the user can simply execute:
# Go to the directory cd plot/studies/vbf_uncertainties/ ./plot_jes_variations.py /path/to/the/merged/accumulator
Other Experimental Uncertainties
Most other experimental uncertainties (including variations in prefire SF, pileup SF etc.) are built in to the VBF H(inv) processor class. All the user needs to do is to configure which uncertainties to run from the configuration files, using the fields here. The syntax for each uncertainty looks like this:
uncertainties: prefire_sf: True btag_sf: True ...
Each flag specifies one of the uncertainty sources, setting a flag
True will run the uncertainty source, and save the results of up and down variations into the
cnn_score_unc variable, corresponding to the uncertainties on the score distribution. See here for an example implementation in the
Plotting the Uncertainties
Once the merged accumulator with the saved uncertainties are produced, these uncertainties can be plotted via the
plot_uncertainties.py script, located here. The usage of this script is simple, it has a
dataset and an
uncertainties variable, keeping track of the dataset, and the list of uncertainties to plot, and save into a ROOT file for later use in the fit. The implementation can be found here. This script can be executed by pointing it to the merged accumulator input:
This will create plots and ROOT files per uncertainty source (as specified in
uncertainties), under the
output directory. Note that by default, this script will plot and save the uncertainties as a function of the CNN score, i.e. it will use the
cnn_score_unc histogram. If you want to plot the variations as a function of
mjj instead, you can specify a
--variable) argument to the script:
# Plot and save the uncertainties as a function of mjj ./plot_uncertainties.py /path/to/the/merged/input/accumulator -v mjj
Note that supported options for
-v argument are only
cnn_score at the moment.
Theory uncertainties, such as the scale and PDF uncertainties, are also computed within the
vbfhinvProcessor class (see here). The set of uncertainties and the ROOT histograms containing each variation is stored in
vbfhinv.yaml file, starting from here. The variations of
gamma/Z(vv) ratios are computed as a function of generator-level boson pt, these variations in weights are applied, and variations in the
dnn_score distributions are saved.
Saving the Theory Uncertainties into a ROOT file
Given the merged accumulator, after running the VBF H(inv) processor and merging the output
.coffea files, the theory uncertainties on V+jets transfer factors can be easily saved into a ROOT file. This ROOT file will also be later used in the fit framework, to save the uncertainty shapes to
combine workspace. To be precise, four uncertainties on the transfer factors are computed here:
- Renormalization scale (
- Factorization scale (
- PDF uncertainty
- NLO EWK correction uncertainty
These uncertainties on V+jets transfer factors can be saved by using the
make_wz_uncertainties.py script, located under
plot/studies/theory_uncertainties. Just point this script to the location of the merged
.coffea files, together with a few additional command line arguments, specifying the variable to save the uncertainties as a function of (default is
cnn_score), and the years to run on (default is to run both 2017 and 2018).
The script can be used as follows:
# Run 2017 only ./make_wz_uncertainties.py /path/to/merged/acc -v cnn_score -y 2017 # Default is to run for 2017 and 2018 (will fail if you're missing data!) ./make_wz_uncertainties.py /path/to/merged/acc -v cnn_score
At the end, this script should output three ROOT files, with the uncertainties saved in the
vbf_z_w_gjets_theory_unc_ratio_unc.root file (which will be used in the fit). You can read the next sub-section for plotting the uncertainties on these ratios.
Plotting the Theory Uncertainties
Once the ROOT file with the uncertainties is produced from the previous step, plotting them is easy! In the same directory, you can use the
plot_uncertainties.py script, and point it to the
vbf_z_w_gjets_theory_unc_ratio_unc.root file produced earlier. Similar to the other script, this takes a
--years) argument which will specify the years to run on.
The script can be executed as follows:
The script should output the plots of uncertainties as PDF files in the same directory as the ROOT file, under the