Tutorial IV: Analysis Modules - artus-analysis/Artus GitHub Wiki

This tutorial is based on the ROOT files generated in the first step.

Preparation

Fast creation of dummy inputs for testing

The input module InputInteractive can create ROOT histogram, graph and function objects on the fly without any need for complicated input files. These objects are ideal for quickly checking the implementation and usage of analysis modules. They might be less useful for real scientific work. The module is steered with a minimal set of parameters

  • -x/y/z or --x/y/z-expressions
  • --x/y/z-bins
  • --x/y/z-errors
  • --x/y-errors-up
  • -w or --weights
  • --scale-factors

and their values define the mode of the module for a given input. The mode can be different for each input object.

Histograms

Histograms are created if --x-bins is provided and the value of -x contains more than one element (split by whitespaces). Currently, no profile histograms are supported at this level, but higher-dimension histograms can be profiled lateron in analysis modules.

  • TH1D:

     harry.py --input-modules InputInteractive --x-bins 3,0.5,3.5 -x "1 2 3 2 1 2 3" "2 1 2" -m LINE
     harry.py --input-modules InputInteractive --x-bins 3,0.5,3.5 -x "1 2 3 2 1" -w "0.5 1 2 0.5 1" -m E
     harry.py --input-modules InputInteractive --x-bins 3,0.5,3.5 -x "1 2 3" -y "3 2 1" --y-errors "2 1 0.5" -m E

    plots/dummy_histogram_1d_fill.png plots/dummy_histogram_1d_weights.png plots/dummy_histogram_1d_bin_contents.png

  • TH2D:

     harry.py --input-modules InputInteractive --x-bins 2,0.5,2.5 --y-bins 2,0.5,2.5 -x "1 1 2 2 2" -y "1 1 1 2 2" -w "1 1 1 1 0.5"
     harry.py --input-modules InputInteractive --x-bins 2,0.5,2.5 --y-bins 2,0.5,2.5 -x "1 1 2 2" -y "1 2 1 2" -z "1 2 3 4"

    plots/dummy_histogram_2d_fill.png plots/dummy_histogram_2d_bin_contents.png

  • TH3D: This module can fill 3D histograms in an analog way, but HarryPlotter is ont (yet) optimised for the display of 3D objects. However, these histograms can make sense in studies with further processing such as projections on one or two axes.

Graphs

Graphs are created if at least -x and -y but no binning is provided.

  • TGraphErrors:

     harry.py --input-modules InputInteractive -x "1 2 3 4 5" -y "5 3 1 2 4" -m LP
     harry.py --input-modules InputInteractive -x "1 2 3 4 5" -y "5 3 1 2 4" --x-errors "0.5 0.5 0.5 0.5 0.5" --y-errors "2 2 1 1 2" -m P

    plots/dummy_graph.png plots/dummy_graph_errors.png

  • TGraphAsymmErrors:

     harry.py --input-modules InputInteractive -x "1 2 3 4 5" -y "5 3 1 2 4" --x-errors "0.5 0.5 0.5 0.5 0.5" --x-errors-up "0.7 0.7 0.7 0.7 0.7" --y-errors "2 2 1 1 2" --y-errors-up "1 1 2 2 1" -m P

    plots/dummy_graph_asymmerrors.png

  • TGraph2DErrors: This module can fill 3D graphs in an analog way, but HarryPlotter is ont (yet) optimised for the display of 3D objects.

Functions

Functions are created if a binning is provided and -x contains exactly one element or if no binning but no -y is provided. The function string must not contain any whitespaces in order not to get split up like entries in histograms. The binning is provided base on the common ROOT syntax and defines <number of points to be drawn>,<lower boundary>,<upper boundary>.

  • TF1/2/3:

     harry.py --input-modules InputInteractive --x-bins "1000,-5,5" -x "x*x" "sqrt(abs(x))" -m C
     harry.py --input-modules InputInteractive --x-bins "1000,-5,5" --y-bins "1000,-5,5" -x "x*y" -m COLZ

    plots/dummy_function_1d.png plots/dummy_function_2d.png

    This module can create 3D functions in an analog way, but HarryPlotter is ont (yet) optimised for the display of 3D objects.

Commonly used analysis modules

All available analysis modules are shown with

harry.py --list-available-modules
harry.py --li # the python argument parser completes long arguments in case the mapping is unique

together with some help. Many modules that operate on two or all input histograms provide meaningful default parameters for the input nicks lits. As always, use --log-level debug to inspect what HarryPlotter is really doing.

Modules usually add options to the argument parser of HarryPlotter. Their help is only available for modules that are added to a given run.

harry.py --analysis-modules <module> -h

The sequence of analysis modules matters and defines the run of HarryPlotter. The sequence in list type parameters of course also matters. But the sequence of parameters in general does not have an impact on the HarryPlotter run.

Many modules take lists as parameters in order to perform similar tasks of multiple inputs and to produce a set of new (or overwritten) ROOT objects. These new objects are sometimes appended to the end of the list of nicks and sometimes close to the nicks of the inputs in a given iteration. This has to be considered when configuring the subsequent plotting modules. The final order of nicks to be plotted is inpected via the following debug output.

harry.py ... --log-level debug | grep -i "final order"

Normalisation of histograms

  • Normalisation to unity integral: this module normalizes all histograms to unity integral. It has no options.

     harry.py --analysis-modules NormalizeToUnity -i gaussians.root -f gaussians -x var1 var2
  • Divide bin contents by bin widths: this module comes in handy when plotting histograms with non-equidistant binning in a non-distorted way.

     harry.py --analysis-modules NormalizeByBinWidth -i gaussians.root -f gaussians -x var1 var2 --x-bins "0 1 2 3 4 6 8 10"

plots/normalize_unity.png plots/normalize_bin_width.png

Ratios of histograms

There are two (or more) modules that perform divisions of histograms: Ratio and Divide. They perform more or less the same task. The main reason for having two modules is that one might have to perform in a single run first a division of histograms, then do something with these ratios in another modules and then again divide its outputs. In the current design this would not be possible with a single module.

harry.py --analysis-modules Ratio -i gaussians.root -f gaussians -x var1 var2 # use meaningful default parameters.
harry.py --analysis-modules Ratio --ratio-numerator-nicks nick0 nick1 --ratio-denominator-nicks nick1 nick0 --ratio-result-nicks ratio1 ratio2 -i gaussians.root -f gaussians -x var1 var2
harry.py --analysis-modules Ratio -i gaussians.root -f gaussians -x var1 var2 --subplot-nicks xyz # put ratio histogram into the upper plot by not letting its nick match with the one in --subplot-nicks.

plots/ratio_1.png plots/ratio_2.png plots/ratio_3.png

There are options on how to treat the uncertainties on numerator and denominator in the resulting histogram, see -h for details. These modules can also divide Graphs (in case where it makes sense) and histograms with different binnings, where the finer binning is taken for the resulting histogram.

If you need a proper uncertainty estimation for ratios (e.g. Clopper-Pearson confidence intervals), then you should have a look at the module Efficiency, which uses the constructor of TGraphAsymmErrors for the division.

Sum of histograms

There are two (or more) modules that perform additions of histograms: AddHistograms and SumOfHistograms. They perform more or less the same task. The main reason for having two modules is that one might have to perform in a single run first an addition of histograms, then do something with these sums in another modules and then again sum up its outputs. In the current design this would not be possible with a single module.

harry.py --analysis-modules AddHistograms --add-nicks "nick0 nick1" --add-result-nicks result -i gaussians.root -f gaussians -x var1 var2 # sum
harry.py --analysis-modules AddHistograms --add-nicks "nick0 nick1" --add-scale-factors "1 -1" --add-result-nicks result -i gaussians.root -f gaussians -x var1 var2 # difference

plots/sum_1.png plots/sum_2.png

Note, that the histograms to be added up in one iteration need to be passed as a single element to --add-nicks. The nicks are internally then splitted by whitespaces. Via the --add-scale-factors option also subtractions of histograms are supported.

Fitting functions to histograms

harry.py --analysis-modules FunctionFit --functions gaus --function-nicknames function --function-parameters "1 0 1" --function-fit nick0 -i gaussians.root -f gaussians -x var1

plots/histogram_fit.png

Plotting of results is not yet really optimised. Fit results are printed in the terminal.

Efficiencies and cuts

harry.py -i gaussians.root -f gaussians -x var1 var2 --x-bins 100,-5,15 -m LINE --labels sigal background --legend
harry.py --analysis-modules CutEfficiency --cut-efficiency-sig-nicks sig_noplot --cut-efficiency-bkg-nicks bkg_noplot --cut-efficiency-modes sigEff bkgEff sigRej bkgRej --cut-efficiency-nicks sigEff bkgEff sigRej bkgRej  -i gaussians.root -f gaussians -x var1 var2 --nicks sig_noplot bkg_noplot --x-bins 100,-5,15 -m LINE --labels bkgRej sigRej bkgEff sigEff --legend -m L --y-label Efficiency
harry.py --analysis-modules CutEfficiency --cut-efficiency-sig-nicks sig_noplot --cut-efficiency-bkg-nicks bkg_noplot --cut-efficiency-modes sigEffVsBkgEff sigEffVsBkgRej sigRejVsBkgEff sigRejVsBkgRej --cut-efficiency-nicks sigEffVsBkgEff sigEffVsBkgRej sigRejVsBkgEff sigRejVsBkgRej -i gaussians.root -f gaussians -x var1 var2 --nicks sig_noplot bkg_noplot --x-bins 100,-5,15 -m LINE --labels sigRejVsBkgRej sigRejVsBkgEff sigEffVsBkgRej sigEffVsBkgEff --legend -m L --x-label Background --y-label Signal

plots/efficiencies_1.png plots/efficiencies_2.png plots/efficiencies_3.png

There are many more options. Look at them with -h.

ExportRoot plot module

Implementing new analysis modules

Design idea of HarryPlotter

HarryPlotter is constructed in a way that every run of the main execuable, e.g. harry.py produces just one plot (which can be saved in multiple, but equivalent, file formats).

By design there is no interface to put in or take out data from plots, e.g. in the form of ROOT histograms or trees. HarryPlotter must read in information in input modules (from files) and can write out information in plot modules (into files). It cannot read in ROOT objects from memory and it is not possible to read ROOT objects from HarryPlotter's memory from outside.

However, actions on ROOT objects or any internal data of plots can be performed in analysis modules and there is almost unlimited flexibility to do so. This has the advantage, that analyses, for which modules exist, can be performed by invoking simple bash commands without writing any new code. There is a clear and well-defined interface.

Accessing Inputs

Saving outputs

Customizations

⚠️ **GitHub.com Fallback** ⚠️