Writing Plugins - TC01/Treemaker GitHub Wiki
Writing Plugins
As we have discussed, Treemaker is built around the concept of plugins--
little Python scripts that live in Treemaker/python/plugins/
. To
actually use Treemaker for your analysis, you will likely need to write some
plugins. If not, you can copy the plugins you'll need into that directory
and skip directly to the next section!
Writing a plugin is not that difficult. Create a new Python file and copy-paste the template below into it. The template is explained below.
For more help, look at other plugins that are shipped in this repository or contact me directly for more information.
Template Plugin
import array
from Treemaker.Treemaker import cuts
# This dictionary is filled by the config file.
parameters = {}
# The input type; defaults to Ntuple if absent.
input_type = "Ntuple"
def setup(variables, isData):
variables['varname'] = array.array('f', [-1.0])
return variables
def analyze(event, variables, labels, isData, cutDict):
handle = labels['module']['label']
product = handle.product()
# Product is likely a vector, so some more processing should happen here.
variables['varname'][0] = product[0]
return variables, cutDict
def createCuts(cutDict):
cutDict['example'] = cuts.Cut('example', 'This is some example cut description').
return cutDict
def drop(event, variables, cutArray, leaves, isData):
# Don't drop any events; but if we wanted to, we could exclude some from ever being filled.
return False
Here, variables
, cutDict
, and labels
are all dictionaries.
The reset function
In Treemaker versions up to v1.1, you needed to manually reset your variables
back to the state you wanted them in using the reset
function:
def reset(variables):
variables['varname'][0] = -1.0
return variables
In Treemaker v1.2 and up, this is no longer necessary. Variables get
automatically reset back to their value at creation (in setup
)
after each event is read. This has been made a deprecation warning. Note
though that you may want to run some extra cleanup code, so the reset
function will still be called and executed if it exists.
Labels
The labels
dictionary is produced by running edmDumpEventContent
over the ntuples that are being converted. A 2D dictionary is then created;
a dictionary where the module names (ex: "diffmoca8pp") are keys for other
dictionaries, where the actual label names (ex: "PrunedCA8Jets") are keys
for Handle objects.
On every event, code is ran that automatically calls event.getByLabel() for all labels in an ntuple, so you don't have to do this yourself. You can look up the handles you need to do treemaking directly as seen above.
variables
is a dictionary mapping variable names to array objects. The
setup()
function is called in all plugins to create this dictionary;
TTree branches are then set up for each variable in that dictionary.
Then, when the analyze()
step runs, you can look up any handles you need
by their module and label names, do whatever processing you want, and then
fill the dictionary of variables and return a copy of it. The treemaker will
automatically call tree.Fill()
after finishing calling all of these
methods.
reset()
is called after filling the ttree, to restore all arrays to their
default value.
The example code above would successfully add a single variable to a ttree, but as you might expect, you can do much more complicated things in plugins.
Lazy Loading
Because creating Handles is unfortunately very computationally expensive,
at least from PyROOT, as of version 0.3 and beyond, Treemaker implements
"lazy loading" of the labels
dictionary.
What this means is that a Handle is not created for a collection from the ntuple
until you attempt to fetch it. When the code handle = labels['module']['label']
(from the above example) runs for the first time, the handle object will be created
and returned. On subsequent runs, the dictionary will remember that the handle was
already created and just return it.
But if you don't ask for a Handle for that collection, one will never be created, and the job will take that much less time to run.
This means the speed at which your code runs is directly tied to the number of collections in the ntuple you need to access. This is a major speed improvement over the original Treemaker version, where we created a Handle for every collection in the ntuple regardless of whether or not the user requested them.
Cuts
The cuts
functionality was added in version 0.2. Cuts are essentially
special variables that are stored in the resulting ttree as a single integer
array. This allows you to implement cuts you'll need later in your analysis
(for instance, such as triggers). When you then want to run an analysis,
you can quickly check if cuts[i]
is 0 or 1 (or some other value) to
determine if the event passed whatever analysis was mapped to cut i
.
Because cuts are named, rather than numbered, a report file
(*_cuts_report.txt
) is generated after Treemaker finishes running with
a mapping between indices in the cuts
array in the tree and the names
you will have given them in your plugins. This information can also be generated
by running the command treemaker-cuts _name_of_config_file
over your
config file (see the config file section for more information).
You can create cuts in the createCuts
method and write to them in the
makeCuts
method.
Parameters
The parameters
dictionary is a new Treemaker feature, added during the polishing and
development of Treemaker v1.0. It allows the user to specify constants in a configuration
file and access them from multiple plugins.
For example, if the configuration file features a section that looks like this:
[parameters]
weight = 2.0
All plugins will have available to them inside the parameters
dictionary the variable
"weight" as follows:
try:
weight = float(parameters['weight'])
except KeyError:
pass
It is recommended that you consider wrapping uses of the parameters
dictionary in a
try/except block, as there is no guarantee the parameter will appear in a config
file.
Drop Function
New in Treemaker v1.1 is the drop
function. It takes the same arguments as the
analyze
function and is called immediately after. If a single plugin's drop
returns True, instead of False, Treemaker will not fill the output tree with the
current event and instead move onto the next event.