Writing Plugins - TC01/Treemaker GitHub Wiki

Writing Plugins

As we have discussed, Treemaker is built around the concept of plugins-- little Python scripts that live in Treemaker/python/plugins/. To actually use Treemaker for your analysis, you will likely need to write some plugins. If not, you can copy the plugins you'll need into that directory and skip directly to the next section!

Writing a plugin is not that difficult. Create a new Python file and copy-paste the template below into it. The template is explained below.

For more help, look at other plugins that are shipped in this repository or contact me directly for more information.

Template Plugin

import array

from Treemaker.Treemaker import cuts

# This dictionary is filled by the config file.
parameters = {}

# The input type; defaults to Ntuple if absent.
input_type = "Ntuple"

def setup(variables, isData):
    variables['varname'] = array.array('f', [-1.0])
    return variables

def analyze(event, variables, labels, isData, cutDict):
    handle = labels['module']['label']
    product = handle.product()
    # Product is likely a vector, so some more processing should happen here.
    variables['varname'][0] = product[0]
    return variables, cutDict

def createCuts(cutDict):
    cutDict['example'] = cuts.Cut('example', 'This is some example cut description').
    return cutDict

def drop(event, variables, cutArray, leaves, isData):
    # Don't drop any events; but if we wanted to, we could exclude some from ever being filled.
    return False

Here, variables, cutDict, and labels are all dictionaries.

The reset function

In Treemaker versions up to v1.1, you needed to manually reset your variables back to the state you wanted them in using the reset function:

def reset(variables):
    variables['varname'][0] = -1.0
    return variables

In Treemaker v1.2 and up, this is no longer necessary. Variables get automatically reset back to their value at creation (in setup) after each event is read. This has been made a deprecation warning. Note though that you may want to run some extra cleanup code, so the reset function will still be called and executed if it exists.

Labels

The labels dictionary is produced by running edmDumpEventContent over the ntuples that are being converted. A 2D dictionary is then created; a dictionary where the module names (ex: "diffmoca8pp") are keys for other dictionaries, where the actual label names (ex: "PrunedCA8Jets") are keys for Handle objects.

On every event, code is ran that automatically calls event.getByLabel() for all labels in an ntuple, so you don't have to do this yourself. You can look up the handles you need to do treemaking directly as seen above.

variables is a dictionary mapping variable names to array objects. The setup() function is called in all plugins to create this dictionary; TTree branches are then set up for each variable in that dictionary.

Then, when the analyze() step runs, you can look up any handles you need by their module and label names, do whatever processing you want, and then fill the dictionary of variables and return a copy of it. The treemaker will automatically call tree.Fill() after finishing calling all of these methods.

reset() is called after filling the ttree, to restore all arrays to their default value.

The example code above would successfully add a single variable to a ttree, but as you might expect, you can do much more complicated things in plugins.

Lazy Loading

Because creating Handles is unfortunately very computationally expensive, at least from PyROOT, as of version 0.3 and beyond, Treemaker implements "lazy loading" of the labels dictionary.

What this means is that a Handle is not created for a collection from the ntuple until you attempt to fetch it. When the code handle = labels['module']['label'] (from the above example) runs for the first time, the handle object will be created and returned. On subsequent runs, the dictionary will remember that the handle was already created and just return it.

But if you don't ask for a Handle for that collection, one will never be created, and the job will take that much less time to run.

This means the speed at which your code runs is directly tied to the number of collections in the ntuple you need to access. This is a major speed improvement over the original Treemaker version, where we created a Handle for every collection in the ntuple regardless of whether or not the user requested them.

Cuts

The cuts functionality was added in version 0.2. Cuts are essentially special variables that are stored in the resulting ttree as a single integer array. This allows you to implement cuts you'll need later in your analysis (for instance, such as triggers). When you then want to run an analysis, you can quickly check if cuts[i] is 0 or 1 (or some other value) to determine if the event passed whatever analysis was mapped to cut i.

Because cuts are named, rather than numbered, a report file (*_cuts_report.txt) is generated after Treemaker finishes running with a mapping between indices in the cuts array in the tree and the names you will have given them in your plugins. This information can also be generated by running the command treemaker-cuts _name_of_config_file over your config file (see the config file section for more information).

You can create cuts in the createCuts method and write to them in the makeCuts method.

Parameters

The parameters dictionary is a new Treemaker feature, added during the polishing and development of Treemaker v1.0. It allows the user to specify constants in a configuration file and access them from multiple plugins.

For example, if the configuration file features a section that looks like this:

[parameters]
weight = 2.0

All plugins will have available to them inside the parameters dictionary the variable "weight" as follows:

try:
    weight = float(parameters['weight'])
except KeyError:
    pass

It is recommended that you consider wrapping uses of the parameters dictionary in a try/except block, as there is no guarantee the parameter will appear in a config file.

Drop Function

New in Treemaker v1.1 is the drop function. It takes the same arguments as the analyze function and is called immediately after. If a single plugin's drop returns True, instead of False, Treemaker will not fill the output tree with the current event and instead move onto the next event.