Tutorial VII: HarryPlotter in Scripts - artus-analysis/Artus GitHub Wiki

This tutorial is based on the ROOT files generated in the first step.

General speeding-up, automation and parallelisation tools

Caching of inputs from ROOT trees

By default, HarryPlotter caches ROOT objects read in from trees and loads the object without touching the trees in case exactly the same object is needed for the second time. This is only enables for inputs, that are given to HarryPlotter with absolute paths. HarryPlotter does not check whether the original trees have changed when looking for cached objects. This is why it is (currently) disabled for relative paths, that usually point to files that often change (e.g. in testing phases).

It is possible to prevent HarryPlotter to search for and use cached objects via

harry.py ... --redo-cache

but the newly read in objects are saved in a cache in any case. If plots do not get updated as they should, it is usual problem, that caches are not re-done and this option is very helpful for excluding mistakes. The identification of hashes is based on a hash of all input options used for a given input object read from a tree. Keep in mind, that hashes are not 100% unique and use --redo-cache in case of doubts.

Caches are stored in a central place (${HP_WORK_BASE_COMMON}/caches), which is ideally a directory where multiple collaborators have read and write access. Like that they can easily share caches and speed up their common analyses.

In case of any technical problem (e.g. that a cache file cannot be found, opened, written, ...) HarryPlotter falls back to reading inputs directly from trees without any error message. The following commands helps inspecting the caching mechanism.

harry.py ... --log-level debug | grep cache

Do not forget to delete old caches from time to time.

Re-plotting from single JSON file

Together with the plotted graphics (or ROOT) files HarryPlotter stores the full JSON configuration that was used to create this plot under the same name with the extension .json. These files can be used to re-do plots easily:

harry.py -j <JSON file> [<more arguments from HarryPlotter>]

You can even add more arguments from HarryPlotter to this call in case you want to overwrite settings in the JSON file or add new settings.

Re-plotting this way of course only works when the input files are still available (at the same location) and if the interface to HarryPlotter (program arguements) and the internal logics/behaviour do not change over time. In order to keep this functionality reasonably working it is adviced for developers to maintain backward compatibility wherever possible.

It is possible to provide the content of JSON files directly as value for the -j option. As an example,

harry.py -i gaussians.root -f gaussians -x var1

is equivalent to

harry.py -j '{"files": ["gaussians.root"], "folders": ["gaussians"], "x_expressions": ["var1"]}'

This offers the possibility to edit JSON files on-the-fly, e.g. via

harry.py -j "`sed -e \"s@var1@var2@g\" plots/var1.json`"

When using webplotting (--www), HarryPlotter saves outputs in a folder structure containing the current date. (This behaviour can also be overwritten, see -h.) This is a very easy way to archieve plots that should later go into scientific documents (theses, papers, ...)

Re-plotting from multiple JSON files

Re-plotting from multiple JSON files can be easily done in one go by using the following command:

multiplots_from_json_configs.py -j <JSON file 1> [<JSON file 2> ...] -n <number of parallel processes> -a " <more arguments from HarryPlotter>"

(For the Higgs-community, there is a pendant called makePlots_jsonConfigs.py.)

Simple parallelizing tools

Artus contains two scripts to easily parallelise the execution of simple bash commands: runParallel.py for running parallel processes on the current machine and batchSubmission.py for submitting batch jobs with grid-control. They have the same interface that can be inspected with -h. It is possible to simply pipe multiple commands into these scripts. The -b option of the batchSubmission.py script takes the same values as the same option in the Artus wrapper script, e.g. rwthcondor or naf.

Some examples where this could be useful for plotting:

  1. for FOLDER in `get_root_file_content.py gaussians3.root | sed -e "s@ (TTree)@@g"`; \
    do \
    	echo harry.py -i gaussians3.root -f ${FOLDER} -x var0 --filename ${FOLDER}_var0; \
    done | runParallel.py -n 3
  2. for BRANCH in `harry.py -i gaussians.root -f gaussians -q | grep Double_t | sed -e "s@ (Double_t)@@g"`;
    do \
    	echo harry.py -i gaussians.root -f gaussians -x ${BRANCH}; \
    done | runParallel.py -n 5
  3. for CUT in {1..8}; \
    do \
    	echo harry.py -i gaussians.root -f gaussians -x var0 -w "\"(var1>=${CUT})*(var1<${CUT}+1)\"" --filename var1_cut_var0_${CUT}; \
    done | runParallel.py -n 8

These scripts can of course also parallelise other (mostly arbitrary) bash commands.

Plotting in scripts

⚠️ **GitHub.com Fallback** ⚠️