Producing histograms - jniedzie/SVJanalysis_wiki GitHub Wiki
Code in histogramsProducer/1DHistograms and histogramsProducer/2DHistograms.
This code produces histograms and saves them as THist in a ROOT file.
ROOT files must then be read to make plots.
More information about plotting scripts in plotting tools.
The input ROOT files from which to make histograms must have 3 trees:
-
Events
: standard Ntuple events tree -
CutFlow
: a tree with a brancheInitial
, having one leaf with the number of events before any pre-selection cuts. -
Metadata
: a tree with a branchGenCrossSection
, having one leaf with the cross-section at generator level
Beside the input ROOT files, several files are needed for making histograms:
- a binning file (example in binning_example.json and binning_example.py) which defines the binning of the different variables
- a branch filters file (example in branchFilters.json) which defines event filters for particular branches
e.g. MT01FatJet cannot be computed for events with less than 2 FatJets, so the value of MT for event with less than 2 FatJets is a dummy value that should not appear in the histogram. Hence the filter. - a filters file (example in filters_example.json and filters_example.py) which defines on-the-fly cuts at event- or object-level to quickly check some cuts.
Latest usage using python make1DHistograms.py -h
.
usage: make1DHistograms.py [-h] -i INPUT_FILE_NAMES -o OUTPUT_FILE_NAME [-c CHUNK_SIZE] [-m MAX_CHUNKS]
[-n N_WORKERS] [--skip_bad_files] [-e {iterative,futures,dask/condor,dask/slurm}]
[-voms VOMS] [-br BRANCHES] -b BINNING [-f FILTERS] [-bf BRANCH_FILTERS]
[-no_normalisation] [-l LUMI] [-metadata_source METADATA_SOURCE]
[-cutflow_source CUTFLOW_SOURCE]
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE_NAMES, --input_file_names INPUT_FILE_NAMES
Input ROOT files name. Format: Comma separated list of files, filename expansion
or .txt file with list of files (1 file name per line)
-o OUTPUT_FILE_NAME, --output_file_name OUTPUT_FILE_NAME
Output ROOT file name
-c CHUNK_SIZE, --chunk_size CHUNK_SIZE
Size of the data chunks (default=100000)
-m MAX_CHUNKS, --max_chunks MAX_CHUNKS
Maximum number of chunks to process, no flag means no maximum
-n N_WORKERS, --n_workers N_WORKERS
Number of worker nodes (default=4)
--skip_bad_files Skip bad files
-e {iterative,futures,dask/condor,dask/slurm}, --executor_name {iterative,futures,dask/condor,dask/slurm}
The type of executor to use (default=futures)
-voms VOMS, --voms VOMS
Path to voms proxy, accessible to worker nodes
-br BRANCHES, --branches BRANCHES
Branches to histogram
-b BINNING, --binning BINNING
json or python file describing binning of the histograms
-f FILTERS, --filters FILTERS
json or python file describing filters for collections and events
-bf BRANCH_FILTERS, --branch_filters BRANCH_FILTERS
json file describing filters for branches
-no_normalisation, --no_normalisation
Do not normalise histograms to expected yield
-l LUMI, --lumi LUMI Total luminosity for normalization of the histograms (default=137190.0 pb-1)
-metadata_source METADATA_SOURCE, --metadata_source METADATA_SOURCE
ROOT file name with Metadata Tree or 'self' meaning taking the first ROOT file name
-cutflow_source CUTFLOW_SOURCE, --cutflow_source CUTFLOW_SOURCE
ROOT file name with Cutflow Tree or 'self' meaning taking the first ROOT file name
Same usage for 2D histograms.
The --branches
flag requires further explanations.
For 1D histograms the syntax is the following:
-b 'std:<regex1>,<regex2>;per_idx_collection:<collection1_name>=<n1>,<collection2_name>=<n2>;per_idx_branch:<branch1_name>=<n3>,<branch2_name>=<n2>'
Where:
-
<regex1>
,<regex2>
are regexes to point to several branches, e.g.^(Fat)?Jet_*
-
<collection1_name>
,<collection2_name>
are names of collections, e.g.FatJet
-
<branch1_name>
,<branch2_name>
are branches names, e.g.FatJet_pt
-
<n1>
,<n2>
,<n3>
,<n4>
are integers
There can be more than 2 regexes, collection names, branch names. They have to be separated by a comma ,
.
Example: If a file has the following branches in the events tree:
MET_pt
nJet
Jet_pt
Jet_eta
Jet_phi
Jet_mass
Jet_ptD
nFatJet
FatJet_pt
FatJet_eta
FatJet_phi
FatJet_mass
FatJet_ptD
Then the following flag
-b 'std:^(Fat)?Jet_*;per_idx_collection:Jet=2,FatJet=2'
will make histograms of Jet_pt
, Jet_eta
, Jet_phi
, Jet_mass
, Jet_ptD
, FatJet_pt
, FatJet_eta
, FatJet_phi
, FatJet_mass
, FatJet_ptD
, but also will make histogram of these variables for leading and sub-leading Jet
and FatJet
. Therefore it will produce 30 histograms!
You can find an example in runMake1DHistograms.sh
.
For 2D histograms the syntax is the following:
-b '<var1>,<var2>;<var3>,<var4>'
or
-b 'rectangular:<regex1>,<regex2>'
or
-b 'rectangular_no_regex:<var1>,<var2>'
or
-b 'product:<var1>,<var2>;<var3>'
or
-b <file_name>.txt
where:
-
<var1>
,<var2>
,<var3>
,<var4>
are branch names (without tree name) -
<regex1>
,<regex2>
are regexes -
<file_name>.txt
is a text file with on each line, two branch names separated by a comma. Lines starting by#
are ignored.
There can be more than as many branch names and regexes as needed.
Several syntaxes can be used at the same time by separating them by ||
.
Example:
-b 'product:FatJet_(pt|ptD|mass)'
will make histograms of FatJet_pt
vs FatJet_ptD
, FatJet_pt
vs FatJet_mass
and FatJet_ptD
vs FatJet_mass
. The similar can be achieved with:
-b 'product_no_regex:FatJet_pt,FatJet_ptD,FatJet_mass'
which can be more convenient than writing regexes.
To make histograms of all fat jet features vs pt, do:
-b 'product:FatJet_pt;FatJet_.*'
To do the same for leading fat jets, do:
-b 'product_with_index_0:FatJet_pt;FatJet_.*'
-b 'FatJet0_pt,FatJet1_pt'
will make histograms of leading FatJet_pt
vs subleading FatJet_pt
. The events should have at least 2 FatJet. If not you can filter on the fly by having
{
"events": "events.nFatJet > 1",
"collections": "None"
}
in filters.json
and running with the flag -f filters.json
.
You can find an example in runMake2DHistograms.sh
.