Tutorial II: Parameter Expansions - KIT-CMS/Artus GitHub Wiki

This tutorial is based on the ROOT files generated in the first step.

Many parameters of HarryPlotter take lists of multiple elements as values, where the number of elements is not restricted. Two examples: Most of the parameters of the InputRoot module take as many elements as values as input objects have to be read in. Similarly, most of the plotting settings in the plot modules, e.g. PlotRoot take as many elements as values as objects have to be plotted. The --help option indicates clearly these list-type parameters by listing multiple possible values. The long forms of the parameter names also should contain the plural-"s".

HarryPlotter has to ensure, that all list that belong together, have the same number of parameters. The parameters that belong to such an expansion group are usually all list-type parameters of a certain parameters group, e.g. the Input options.

harry.py --help | grep "Input options" -A 75

In case the specified parameter values do not match in terms of list length, HappyPlotter expands the shorter lists to match the size of the longest list. These expansions are performed by the function Processor.prepare_list_args, which provides extensive debug output for all steps and prints warnings in case something unintentionally might occur. Therefore all expansions can be found by searching for calls of these function.

Example: expansion in InputRoot

In the InputRoot module all list-type parameters regarding the inputs are expanded:

  • nicks
  • directories
  • files
  • folders
  • friend_trees
  • x_expressions
  • y_expressions
  • z_expressions
  • x_bins
  • y_bins
  • z_bins
  • weights
  • tree_draw_options
  • scale_factors

Only one item - no expansion

In case of the most simple example

harry.py --log-level debug -i gaussians3.root -f gaussians1 -x var0

the following debug information is printed out.

Argument list expansion: InputRoot options
	Item 0:
		nicks -> None
		x_expressions -> var0
		y_expressions -> None
		z_expressions -> None
		x_bins -> None
		y_bins -> None
		z_bins -> None
		scale_factors -> 1.0
		files -> ['gaussians3.root']
		directories -> None
		folders -> ['gaussians1']
		weights -> 1.0
		friend_trees -> None
		tree_draw_options -> 

As the maximum list length is 1, there is only one item to iterate (Item 0). The specified values, files -> ['gaussians3.root'], folders -> ['gaussians1'] and x_expressions -> var0 are recognised as expected. The values for the other parameters remain empty or, if available, at the given default value.

Iteration over only one parameter

In most cases, multiple graphs have to be shown in a single plot that differ in only one aspect. In the next example three different branches are plotted from the same tree in the same file.

harry.py --log-level debug -i gaussians3.root -f gaussians1 -x var0 var1 var2

Then the debug output lists three iterations.

Argument list expansion: InputRoot options
	Item 0:
		nicks -> None
		x_expressions -> var0
		y_expressions -> None
		z_expressions -> None
		x_bins -> None
		y_bins -> None
		z_bins -> None
		scale_factors -> 1.0
		files -> ['gaussians3.root']
		directories -> None
		folders -> ['gaussians1']
		weights -> 1.0
		friend_trees -> None
		tree_draw_options -> 
	Item 1:
		nicks -> None
		x_expressions -> var1
		y_expressions -> None
		z_expressions -> None
		x_bins -> None
		y_bins -> None
		z_bins -> None
		scale_factors -> 1.0
		files -> ['gaussians3.root']
		directories -> None
		folders -> ['gaussians1']
		weights -> 1.0
		friend_trees -> None
		tree_draw_options -> 
	Item 2:
		nicks -> None
		x_expressions -> var2
		y_expressions -> None
		z_expressions -> None
		x_bins -> None
		y_bins -> None
		z_bins -> None
		scale_factors -> 1.0
		files -> ['gaussians3.root']
		directories -> None
		folders -> ['gaussians1']
		weights -> 1.0
		friend_trees -> None
		tree_draw_options -> 

The single difference between the three iterations are the values of x_expressions. All other list elements have been duplicated three times. Similiarly, an iteration over different trees would work.

harry.py --log-level debug -i gaussians3.root -f gaussians1 gaussians2 gaussians3 -x var0

Simultaneous iteration over multiple parameters

Now the var0 values from one tree inside each file has to be plotted. A loop just over all files is not sufficient since the trees in the files are named differently.

for file in gaussians*.root; do echo "$file contains:"; get_root_file_content.py $file; echo; done

gives

gaussians.root contains:
gaussians (TTree)

gaussians1000.root contains:
gaussians (TTree)

gaussians3.root contains:
gaussians1 (TTree)
gaussians2 (TTree)
gaussians3 (TTree)

The plotting command now is

higgsplot.py --log-level debug -i gaussians.root gaussians3.root gaussians1000.root -f gaussians gaussians1 gaussians -x var0

Then the debug output lists again three iterations as the maximum number of elements of all parameters is 3.

Argument list expansion: InputRoot options
	Item 0:
		nicks -> None
		x_expressions -> var0
		y_expressions -> None
		z_expressions -> None
		x_bins -> None
		y_bins -> None
		z_bins -> None
		scale_factors -> 1.0
		files -> ['gaussians.root']
		directories -> None
		folders -> ['gaussians']
		weights -> 1.0
		friend_trees -> None
		tree_draw_options -> 
	Item 1:
		nicks -> None
		x_expressions -> var0
		y_expressions -> None
		z_expressions -> None
		x_bins -> None
		y_bins -> None
		z_bins -> None
		scale_factors -> 1.0
		files -> ['gaussians3.root']
		directories -> None
		folders -> ['gaussians1']
		weights -> 1.0
		friend_trees -> None
		tree_draw_options -> 
	Item 2:
		nicks -> None
		x_expressions -> var0
		y_expressions -> None
		z_expressions -> None
		x_bins -> None
		y_bins -> None
		z_bins -> None
		scale_factors -> 1.0
		files -> ['gaussians1000.root']
		directories -> None
		folders -> ['gaussians']
		weights -> 1.0
		friend_trees -> None
		tree_draw_options -> 

But now both the files and folders parameters are varied. The mapping of the parameters values in each iteration follows exactly the same sequence as specified in the program arguments.

Warnings of possibly unintentional behaviour

The same plot is produced by

higgsplot.py -i gaussians.root gaussians3.root gaussians1000.root -f gaussians gaussians1 -x var0

but the following warning is given

WARNING: Parameters 'nicks', 'x_expressions', 'y_expressions', 'z_expressions', 'x_bins', 'y_bins', 'z_bins', 'scale_factors', 'files', 'directories' require parameter list length of 3. Parameters 'folders'(2) will be replicated to match required length.

HarryPlotter repeats the elements of lists that are shorter than the longest list. In case, shorter lists contain only one element, that behaviour is trivial: this element gets duplicated until the number of elements corresponds the length of the longest list (see section Iteration over only one parameter). In case the shorter contains more than one element, the behaviour is still well defined: This full list gets duplicated until the number of elements corresponds the length of the longest list. Example: The shorter list contains two elements a and b and the longest list five elements. This results in an expansion of the shorter list to [a, b, a, b, a]. These elements get then matched with the ones of the longest list in the same ordering. In most cases, when a parameter is configured with more than one but less than the number of elements of the longest list, the resulting behaviour is not intended by the user and HarryPlotter assumes a misconfiguration. Therefore the warning is printed.

In the example above, the warning says that the paramters files and others have a length of 3 (or could be trivially expanded) and the parameter folders has only two values and therefore gets expanded.

The following example shows a misconfiguration

higgsplot.py -i gaussians3.root gaussians.root gaussians1000.root -f gaussians1 gaussians -x var0

Again, the same warning is given. Debug options --log-level debug reveal

	Item 2:
		nicks -> None
		x_expressions -> var0
		y_expressions -> None
		z_expressions -> None
		x_bins -> None
		y_bins -> None
		z_bins -> None
		scale_factors -> 1.0
		files -> ['gaussians1000.root']
		directories -> None
		folders -> ['gaussians1']
		weights -> 1.0
		friend_trees -> None
		tree_draw_options -> 

that in the last item again the first configured folder (gaussians1) is matched to the iteration with files -> [gaussians1000.root]. As this file does not contain a tree named gaussians1, the configuration is wrong and an error is thrown later.

ERROR: Could not find ROOT object "gaussians1" in file "gaussians1000.root"! (roottools.py: line 71)
CRITICAL: Error getting ROOT object from file. Exiting. (inputroot.py: line 153)

In other cases, the run of HarryPlotter is still well defined and does fail. However, a possible misconfiguration might be more difficult to notice without this warning.