Producing histograms - jniedzie/SVJanalysis_wiki GitHub Wiki

Overview

Code in histogramsProducer/1DHistograms and histogramsProducer/2DHistograms.

This code produces histograms and saves them as THist in a ROOT file.
ROOT files must then be read to make plots.
More information about plotting scripts in plotting tools.

Input ROOT files

The input ROOT files from which to make histograms must have 3 trees:

  • Events: standard Ntuple events tree
  • CutFlow: a tree with a branche Initial, having one leaf with the number of events before any pre-selection cuts.
  • Metadata: a tree with a branch GenCrossSection, having one leaf with the cross-section at generator level

Other files

Beside the input ROOT files, several files are needed for making histograms:

  • a binning file (example in binning_example.json and binning_example.py) which defines the binning of the different variables
  • a branch filters file (example in branchFilters.json) which defines event filters for particular branches
    e.g. MT01FatJet cannot be computed for events with less than 2 FatJets, so the value of MT for event with less than 2 FatJets is a dummy value that should not appear in the histogram. Hence the filter.
  • a filters file (example in filters_example.json and filters_example.py) which defines on-the-fly cuts at event- or object-level to quickly check some cuts.

Usage

Latest usage using python make1DHistograms.py -h.

usage: make1DHistograms.py [-h] -i INPUT_FILE_NAMES -o OUTPUT_FILE_NAME [-c CHUNK_SIZE] [-m MAX_CHUNKS]
                           [-n N_WORKERS] [--skip_bad_files] [-e {iterative,futures,dask/condor,dask/slurm}]
                           [-voms VOMS] [-br BRANCHES] -b BINNING [-f FILTERS] [-bf BRANCH_FILTERS]
                           [-no_normalisation] [-l LUMI] [-metadata_source METADATA_SOURCE]
                           [-cutflow_source CUTFLOW_SOURCE]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE_NAMES, --input_file_names INPUT_FILE_NAMES
                        Input ROOT files name. Format: Comma separated list of files, filename expansion
                        or .txt file with list of files (1 file name per line)
  -o OUTPUT_FILE_NAME, --output_file_name OUTPUT_FILE_NAME
                        Output ROOT file name
  -c CHUNK_SIZE, --chunk_size CHUNK_SIZE
                        Size of the data chunks (default=100000)
  -m MAX_CHUNKS, --max_chunks MAX_CHUNKS
                        Maximum number of chunks to process, no flag means no maximum
  -n N_WORKERS, --n_workers N_WORKERS
                        Number of worker nodes (default=4)
  --skip_bad_files      Skip bad files
  -e {iterative,futures,dask/condor,dask/slurm}, --executor_name {iterative,futures,dask/condor,dask/slurm}
                        The type of executor to use (default=futures)
  -voms VOMS, --voms VOMS
                        Path to voms proxy, accessible to worker nodes
  -br BRANCHES, --branches BRANCHES
                        Branches to histogram
  -b BINNING, --binning BINNING
                        json or python file describing binning of the histograms
  -f FILTERS, --filters FILTERS
                        json or python file describing filters for collections and events
  -bf BRANCH_FILTERS, --branch_filters BRANCH_FILTERS
                        json file describing filters for branches
  -no_normalisation, --no_normalisation
                        Do not normalise histograms to expected yield
  -l LUMI, --lumi LUMI  Total luminosity for normalization of the histograms (default=137190.0 pb-1)
  -metadata_source METADATA_SOURCE, --metadata_source METADATA_SOURCE
                        ROOT file name with Metadata Tree or 'self' meaning taking the first ROOT file name
  -cutflow_source CUTFLOW_SOURCE, --cutflow_source CUTFLOW_SOURCE
                        ROOT file name with Cutflow Tree or 'self' meaning taking the first ROOT file name

Same usage for 2D histograms.

-b/--branches flag

The --branches flag requires further explanations.

1D histograms

For 1D histograms the syntax is the following:

-b 'std:<regex1>,<regex2>;per_idx_collection:<collection1_name>=<n1>,<collection2_name>=<n2>;per_idx_branch:<branch1_name>=<n3>,<branch2_name>=<n2>'

Where:

  • <regex1>, <regex2> are regexes to point to several branches, e.g. ^(Fat)?Jet_*
  • <collection1_name>, <collection2_name> are names of collections, e.g. FatJet
  • <branch1_name>, <branch2_name> are branches names, e.g. FatJet_pt
  • <n1>, <n2>, <n3>, <n4> are integers

There can be more than 2 regexes, collection names, branch names. They have to be separated by a comma ,.

Example: If a file has the following branches in the events tree:

    MET_pt
    nJet
    Jet_pt
    Jet_eta
    Jet_phi
    Jet_mass
    Jet_ptD
    nFatJet
    FatJet_pt
    FatJet_eta
    FatJet_phi
    FatJet_mass
    FatJet_ptD

Then the following flag

-b 'std:^(Fat)?Jet_*;per_idx_collection:Jet=2,FatJet=2'

will make histograms of Jet_pt, Jet_eta, Jet_phi, Jet_mass, Jet_ptD, FatJet_pt, FatJet_eta, FatJet_phi, FatJet_mass, FatJet_ptD, but also will make histogram of these variables for leading and sub-leading Jet and FatJet. Therefore it will produce 30 histograms!

You can find an example in runMake1DHistograms.sh.

2D histograms

For 2D histograms the syntax is the following:

-b '<var1>,<var2>;<var3>,<var4>'

or

-b 'rectangular:<regex1>,<regex2>'

or

-b 'rectangular_no_regex:<var1>,<var2>'

or

-b 'product:<var1>,<var2>;<var3>'

or

-b <file_name>.txt

where:

  • <var1>, <var2>, <var3>, <var4> are branch names (without tree name)
  • <regex1>, <regex2> are regexes
  • <file_name>.txt is a text file with on each line, two branch names separated by a comma. Lines starting by # are ignored.

There can be more than as many branch names and regexes as needed. Several syntaxes can be used at the same time by separating them by ||.

Example:

-b 'product:FatJet_(pt|ptD|mass)'

will make histograms of FatJet_pt vs FatJet_ptD, FatJet_pt vs FatJet_mass and FatJet_ptD vs FatJet_mass. The similar can be achieved with:

-b 'product_no_regex:FatJet_pt,FatJet_ptD,FatJet_mass'

which can be more convenient than writing regexes.

To make histograms of all fat jet features vs pt, do:

-b 'product:FatJet_pt;FatJet_.*'

To do the same for leading fat jets, do:

-b 'product_with_index_0:FatJet_pt;FatJet_.*'
-b 'FatJet0_pt,FatJet1_pt'

will make histograms of leading FatJet_pt vs subleading FatJet_pt. The events should have at least 2 FatJet. If not you can filter on the fly by having

{
    "events": "events.nFatJet > 1",
    "collections": "None"
}

in filters.json and running with the flag -f filters.json.

You can find an example in runMake2DHistograms.sh.

⚠️ **GitHub.com Fallback** ⚠️