Exercise A: The Event Loop - MinervaExpt/MINERvA-101-Cross-Section GitHub Wiki

An event loop reads independent neutrino interactions and reduces them to some simplified format. Every data preservation analysis starts with one or more event loops over so-called Tuples. There are actually other event loops that were used to produce these Tuples, but they have already been run by the MINERvA production team.

Our macro stage event loop will reduce Tuples to the ingredients we need for either a cross section, or for comparisons between data and simulation under modified simulation hypotheses. There are many shared ingredients, so both paths will start from the same point up until a designated split point further below. In this tutorial we're going to learn how to run and edit a simple event loop; analyses that produce multi-dimensional cross sections or have other unique computing needs might split their event loops up into more stages by memory requirements or run time. At the end of this exercise, you'll have a .root file with cross section ingredients from the runEventLoop program you updated.

The Cross Section Ingredients

For those on MINERvA, Alex Ramírez's talk explained what a cross section is, why they're important to measure, and one common procedure for extracting a cross section from our data using a Monte Carlo simulation. For those participating in an open data product tutorial, a slightly edited version of this talk should have been sent to you. Ask your contact if you have not received this. For those self-guiding, we are working to provide complimentary resources akin to help contextualize the pursuit of measuring cross sections. They may already be available on the MINERvA Open Data Webpage.

How do we turn the formula above into a program for reducing AnaTuples to the histograms we'll need to extract a cross section?

Since i and j are true and reco bin indices, then each symbol in Alex's figure (pictured above) is a histogram. The efficiency * acceptance correction will turn out to be the ratio of two different histograms. We can sort these histograms along two axes:

reco/true variables on their axes
which cuts are applied

The DATA histogram will also be unique because it comes from the data sample. All other cross section ingredients in this tutorial will be derived from a Monte Carlo simulation of the MINERvA experiment.

The data can only be measured and selected in reco variables
The backgrounds are subtracted from the data, so they must also be binned in reco variables and pass the reco selection.
The efficiency numerator characterizes signal events that pass the reco selection. It is applied to the data after the migration matrix, so it must be calculated in true variables.
The efficiency denominator counts all events that pass the signal definition, even if they fail the reco selection. The AnaTool that produced the MasterAnaDev AnaTuples we'll be using already threw out some signal events, so the efficiency denominator has its own larger tuple of events that we could have detected.
In certain circumstances, modifications to both the efficiency denominator and efficiency numerator are needed to correct for incorrect/incomplete MC modeling of what we expect in the data. In these instances, there may be slight differences between the actual simulated number of events and the efficiency denominator. Thus, saving an additional true event rate may be needed. No such effects are currently implemented in this tutorial. For more information, see the later section "Modified Efficiency Denominators".
The migration matrix converts reco variables to true variables, so it is a 2D histogram with one axis of each type. We're going to apply it before the efficiency * acceptance correction.
The flux is simulated and constrained independently from our analysis. It is provided in true variables.

Whirlwind Tour of `runEventLoop`

runEventLoop calculates the cross section ingredients in one pass over the data and MC samples. The flux our detector receives changed throughout data taking, so we split our data sample, and our MC sample to match, into flux periods called playlists. We're going to be analyzing the minervame1A playlist throughout this tutorial. runEventLoop is designed to process only 1 playlist at a time, and we have to tell it which files to process on the command line like this: runEventLoop data.txt mc.txt

If you hit the TAB key after typing runEventLoop , the shell will list suitable file lists. This is a feature of the bash shell called auto-completion. If you run runEventLoop with no arguments, it will tell you about its input and output. This is typical behavior for programs in UNIX-based operating systems. If you read its "help text" closely, you'll notice that runEventLoop also looks for some environment variables.

The event loop itself is split into loops over 3 chains of TTrees for data, Monte Carlo, and the so-called "Truth Tree" that's used for the efficiency denominator. The event loop is really a loop over systematic Universes from the MINERvA Analysis Toolkit. Each Universe triggers a separate analysis with an assumption about our detector or our reconstruction changed. The event selections made are controlled by a PlotUtils::Cutter, and the physics model is reweighted to MnvTunev1 with a PlotUtils::Model. Histograms are mapped to physics quantities by PlotUtils::Variables. The main() function sets all of them up and delegates the event loops to 1 function for each chain. All of these things are more or less ready for you.

Building Exercise A

Check out the "exerciseA" branch:

#from opt/build...
cd ../../MINERvA-101-Cross-Section
git checkout Exercise-A
cd ../opt/build
make install #Compile the modified code to start this tutorial

If this is your first time installing the code, and have not confirmed that it will run, test this now. First set up the newly built version of the tutorial:

source /exp/minerva/app/users/$USER/MINERvA101_2025/opt/bin/setup.sh

If everything built successfully and you were able to source the setup script, you should be able to just run:

runEventLoop

Without any arguments, you will get a help message about the different data and MC playlists you can use as input, confirming that the tutorial was set up correctly.

Your Task

You need to install the histograms we'll need to extract a cross section. The histograms need to be created for each physics observable, so make them member variables in util/Variable.h. You'll need to do these things for each histogram:

Initialize it in InitializeMCHists() or InitializeDataHists()
Write() is in WriteMC() or WriteData() so it can be plotted in another program
Call SyncCVHistos() in SyncCVHistos() so that systematics work correctly
Fill() it in runEventLoop.cpp in a function like LoopAndFillData() after a Universe passes the necessary Cuts.

There are a few example histograms that are not cross section ingredients to get you started.

Both the Cross Section Approach and Data/MC Comparisons care about the reco level distributions. I will refer to these ingredients which do not care about the true variables or signal definitions as "reconstruction ingredients". Start first by installing these, specifically for muon transverse momentum ("pT") since they are shared between the two split paths this tutorial supports. At the "Split Point" is where the cross section and data/MC comparison approaches separate in these instructions.

A hint to save some code running time: Do you need to run the truth loop when only filling these reconstruction histograms? Recall: in C++, you can turn a line of code into a comment by preceding it with //

e.g. //I just want to tell whoever is reading my code hello

Once you have the reconstruction ingredients (for simulation and data) installed (and built: recall you have to rebuild after any changes you want to test), you're ready to use the majority of runEventLoop! In a new shell:

Set up ROOT. This might be automatic on your personal laptop. On the MINERvA GPVMs, source <pathToTutorialArea>/opt/bin/setupROOT6OnGPVMs.sh.
source <pathToTutorialArea>/opt/bin/setup.sh - Do this every time you open a new terminal or log into a GPVM to work on this project
Create a working directory that is not in the source code directory of the build area. On my laptop, I put this in Documents/MINERvA101. On the GPVMs, I use /exp/minerva/data/users/$USER/MINERvA101.
Create a "playlist" text file that will tell runEventLoop where to find the local tuple files you are working with (if you're not streaming them through the xrootd paths). The files provided in the opt/etc/playlists area are examples of these for streaming all or a subset of the playlist.
runEventLoop --help to get a summary of how it works.
Test your event loop with shorter file lists and systematics off:
- Turn off systematics with export MNV101_SKIP_SYST=1
- Generate truncated playlists like the example of the short streaming playlists provided in the area, but for your local files if you are using them.
- If you are streaming and need a shorter playlist from full files tail -n 5 <Playlist>_<MC/Data>.txt > short<MC/Data>.txt will grab the last five lines of the input file.
Now runEventLoop shortData.txt shortMC.txt. Do you get a message that says "Success" at the end? If not, ask for help from the instructor.

In "The Solution" are the root files which you can open and to which you can compare your results.

Comparisons

Now, let's look at some of the histograms you produced to make sure they're not empty. They're in .root files, so we're going to open them interactively using ROOT's c++ interpreter:

root -l runEventLoopMC.root

.ls #Lists ROOT objects like histograms in the current TDirectory

TFile**		runEventLoopMC.root	
 TFile*		runEventLoopMC.root	
  .
  .
  .
  KEY: PlotUtils::MnvH1D	pTmu_data;1	pTmu
  .
  .
  .

pTmu_data->SetLineWidth(3) #Make histogram line easier to see
pTmu_data->Draw("HIST")

You should see something like this, with potentially different statistics:

Split Point!!!

If you are carrying through the data/MC comparison portion of this tutorial, jump to Exercise E. Otherwise continue on here. Now you are going to finish adding the cross section ingredients, and rerunning the code to have them filled.

Once you have the cross section ingredients installed, you're ready to use ALL of runEventLoop. The instructions for running the event loop are essentially the same as above. As before, in a new shell:

Set up ROOT. This might be automatic on your personal laptop. On the GPVMs, source /path/to/opt/bin/setupROOT6OnGPVMs.sh.
source /path/to/opt/bin/setup.sh #Do this every time you open a new terminal or log into a GPVM to work on this project
Create a working directory that is not in the source code directory or the build area. On my laptop, I put this in Documents/MINERvA101. On the GPVMs, I use /exp/minerva/data/users/$USER/MINERvA101.
runEventLoop --help to get a reminder summary of how it works.
Test your event loop with shorter file lists and systematics off:
- Turn off systematics with export MNV101_SKIP_SYST=1
- RECALL from previous developments: Generate truncated playlists with tail -n 5 MAD_minervame1A_MC_xrd.txt > shortMC.txt and tail -n 5 MAD_minervame1A_DATA_xrd.txt > shortData.txt
- On the GPVMs, instead do tail -n 5 /exp/minerva/app/users/$USER/MINERvA101/opt/etc/playlists/MAD_minervame1A_MC_xrd.txt > shortMC.txt and tail -n 5 /exp/minerva/app/users/$USER/MINERvA101/opt/etc/playlists/MAD_minervame1A_DATA_xrd.txt > shortData.txt
Now runEventLoop shortData.txt shortMC.txt. Do you get a message that says "Success" at the end? If not, ask for help from the instructor, if not self-guided.

Comparisons

Now, let's look at some of the histograms you produced to make sure they're not empty. They're in .root files, so we're going to open them interactively using ROOT's c++ interpreter:

root -l runEventLoopMC.root

.ls #Lists ROOT objects like histograms in the current TDirectory

TFile**		runEventLoopMC.root	
 TFile*		runEventLoopMC.root	
  . # The other objects listed should include ALL the ingredients you expect!
  .
  .
  KEY: PlotUtils::MnvH1D	pTmu_efficiency_numerator;1	pTmu
  .
  .
  .

pTmu_efficiency_numerator->SetLineWidth(3) #Make histogram line easier to see
pTmu_efficiency_numerator->Draw("HIST")

You should see something like this, with potentially different statistics:

Next Steps

If you are pursuing the data/MC Comparison pathway, continue to Exercise E. If you are following an in-person tutorial, skip ahead to Exercise D to plot results. You can return to Exercises B/C if you have time, or if you follow up on this in retrospect. For those self-guiding continue onto Exercise B. This is the intended order, when you have the full time to explore, as this provides important insights into the care behind cross section extraction results.

The Solution

The "main" branch combines the code for the solutions to all exercises. Compare its runEventLoop.cpp and utils/Variable.h to your own with git diff. Compare the histograms in runEventLoopMC.root and runEventLoopData.root to example data and example MC.

Homework

runEventLoop takes a lot longer when it's accounting for our standard set of systematic uncertainties. Repeat the instructions to run it, but git checkout main and do not set MNV101_SKIP_SYST this time. This will take 1-2.5 hours to complete. If you have problems with your laptop disconnecting from a GPVM while the tutorial is running, read about using GNU screen to wrap your interactive session.

Note about Modified Efficiency Denominators

Generally, we expect the efficiency denominator to exactly match the true simulated (or more accurately the CV model tune's) event rate. This holds under most cases, but there are a few cases where this may not be fully true. A concrete example on MINERvA is that we know of deficiencies in our GEANT model's handling of neutron interactions. We have a few different reweighting schemes to correct for these. In the case of reweighting the CV model of hadron interactions, there are so many different potential cases of which hadrons with which momenta which undergo a set of interactions along their paths in the detector, that it is nearly impossible to sample the entire phase space. We do not apply this weight in the tutorial as it should not affect the selection or variables at all in the inclusive selection, and turning the weights on slows the code down significantly. We are developing for a future edit to this tutorial an example of how to apply these weights, and correctly carry around the unmodified true event rate.

By the incompleteness of the reweighting, the true event rate can potentially be biased by applying these weights. Where possible, we renormalize the total effect of the weights such that the true event rate remains unchanged. However, these weights may need to be different across different exclusive signal definitions, and since we cannot cover all possible use cases, one must consider how to deal with this problem regardless of the selection being used. In such a case, one would apply the weights in the efficiency denominator as well. This will do its best to accurately reflect the real detector and cancel the biases of the reweighting procedure in the efficiency division, which will be applied to data.

However, the bias may exist in the efficiency denominator itself, and carry through an MC-only extraction to where your result doesn't match the underlying model exactly. This is where it may become useful to carry around the true event rate separately from the efficiency denominator. This will allow you to know what the model predicts from inside your code for comparisons, but also still check that everything for the cross section extraction is filled correctly. It should be filled at the same moment and for the same events as the efficiency denominator, but only be weighted by the weights meant to achieve the CV model tune of the neutrino interactions, and not correct for GEANT/detector mismodelings. As noted above, we are developing an example of how to carry this around.

Exercise A: The Event Loop - MinervaExpt/MINERvA-101-Cross-Section GitHub Wiki

The Cross Section Ingredients

Whirlwind Tour of `runEventLoop`

Building Exercise A

Your Task

Comparisons

Split Point!!!

Comparisons

Next Steps

The Solution

Homework

Note about Modified Efficiency Denominators

Next: Exercise B: The Closure Test

⚠️ GitHub.com Fallback ⚠️

Exercise A: The Event Loop - MinervaExpt/MINERvA-101-Cross-Section GitHub Wiki

The Cross Section Ingredients

Whirlwind Tour of runEventLoop

Building Exercise A

Your Task

Comparisons

Split Point!!!

Comparisons

Next Steps

The Solution

Homework

Note about Modified Efficiency Denominators

Next: Exercise B: The Closure Test

⚠️ **GitHub.com Fallback** ⚠️

Whirlwind Tour of `runEventLoop`

⚠️ GitHub.com Fallback ⚠️