How to create a ground truth image from LArSoft UBooNECode data - twongjirad/LArLiteSoftCookBook GitHub Wiki

Quick expert summary on how to do this

Check out a copy of uboonecode using v06_26_01_09 and get the DL_br branch
In uboonecode/LArCVImageMaker create a Supera process module. Use one of the existing ones as a template -- but, basically, one is gathering truth larsoft data products and passing it to a function to create your ground truth labels
The module definition needs to include a factory instance for the module as well, so that the manager module, LArSoftSuperaSriver can call up the new Supera module and run it
The LArSoftSuperaSriver is configured using a supera fcl file, which allows you to run a series of supera modules. The modules can reference the produces of upstream supera modules in case that is helpful in generating your ground truth.
after your write your module, you need to setup the fcl files to run it
to do so, add a configuration for it in dlprod_fclbase_analyzer.fcl by creating a copy of SuperaWholeView and then set the configuration super fcl.
add the configuration to the set of dl analyzers: dlprod_analyzers
then you can call this analyzer in your driver fcl file. An example to modify is in uboonecode/fcl/deeplean/standard_larcv_uboone.fcl

Tutorial

Background notes

This work will be done on the Fermilab machines, so you will need an account there.

The goal of this tutorial is to walk you through the task of generating truth information in the form of a labeled image. The task will be to provide labels that provide us a way of determining distinct particles in an image. This is a specific task so that the tutorial can be concrete. Of course, your truth labels and tasks will almost certainly be different.

Truth information includes (but is not limited to) the following: 1) pixel-wise labels in an image, 2) a list of labels fora whole image and 3) bounding boxes indicating the location of an object in an image. The LArCV framework provides a means of storing all three types of truth information within a ROOT file. It can, of course, storing images -- be it images of truth labels or an image of the TPC data. It does so with the Image2D class. LArCV also allows one to store a set of labels and bounding boxes using the ROI class. Each ROI instance can store a bounding box, i.e. (x,y) position along with height and width AND a label. You can choose to use it store one type of information or another or both.

For example, if you want to label an entire image as either neutrino or cosmic, you would save an ROI object with either a neutrino or cosmic label for each image. You would then leave the bounding box info unused. However, if you wanted to save the bounding box positions of individual particles coming from a neutrino interaction, you would save the box information and particle ID of each particle using an ROI instance for each particle.

The task described in this tutorial will be on how to fill an image with truth information.

Setup your larsoft environment

When starting out, you will need to get a copy of larsoft that contains the image conversion tools required and then build that code. You will then modify this copy and rebuild when you implement your specific code for generating your ground truth image.

When coming back to the task, you do not need to re-download the code. But each time you log into the Fermilab machines, you will need to setup your code's environment variables again.

The general description for both of these tasks is given here.

Steps (checking out a fresh copy)

First setup access to the uboonecode software

source /grid/fermiapp/products/uboone/setup_uboone.sh

Next, activate a specific version of uboonecode (warning this changes over time)

setup uboonecode v06_26_01_09 -q e10:prof

When you do this, environment variables setting up the different pieces of larsoft necessary for this version of uboonecode will have been defined. For example, you can run echo $UBOONECODE_DIR to see where this copy of uboonecode lives.

However, we don't want to use the existing copy, we want to be able to get our version of the source, build it, and run our copy of uboonecode instead. To do this, first make a directory where our copy will live (you can choose whatever directory name you want, but I've made a concrete choice of dl_brdev here.

mkdir dl_brdev

Note, on the uboone machines, you should make this directory somewhere on the uboone app folder

/uboone/app/users/[your username]

(if this location doesn't exist, you can make one.)

Now we will use mrb, a tool that helps us to manage our larsoft development environment, to start a new development space. Go into the dl_brdev folder and run

cd dl_brdev
mrb newDev

You'll see the something similar:

building development area for larsoft v06_26_01_08 -q e10:prof

MRB_BUILDDIR is /uboone/app/users/tmw/dev/dl_brdev/build_slf6.x86_64
MRB_SOURCE is /uboone/app/users/tmw/dev/dl_brdev/srcs 
INFO: copying /cvmfs/fermilab.opensciencegrid.org/products/larsoft/larsoft/v06_26_01_08/releaseDB/base_dependency_database
 
IMPORTANT: You must type
source /uboone/app/users/tmw/dev/dl_brdev/localProducts_larsoft_v06_26_01_08_e10_prof/setup
NOW and whenever you log in

We follow the advice

source localProducts_larsoft_v06_26_01_08_e10_prof/setup

Now we have this development environment setup. But to develop, we need source code. We go into the sources folder, srcs and type

cd srcs
mrb g uboonecode

We have to checkout a specific branch of the uboonecode repo. Go to the uboonecode directory

cd ../uboonecode/
git checkout DL_br

You should see this

Branch DL_br set up to track remote branch DL_br from origin.
Switched to a new branch 'DL_br'

Now that we have copies of the source, let's finish setting up the mrb build environment with

mrbsetenv

Now we can start the build

mrb i -j4

You can go get a snack or coffee at this point. This will take 5-10 minutes.

Setting up Steps (re-activate an existing copy)

First, setup the cvmfs software

 source /grid/fermiapp/products/uboone/setup_uboone.sh

Go to your copy of uboonecode. Following the example above, this would be something like

/uboone/app/users/[your username]/dl_brdev

Reactive your build environment

source localProducts_larsoft_v06_26_01_08_e10_prof/setup
mrbsetenv

(Note that localProducts_larsoft_v06_26_01_08_e10_prof might be different. It depends on the current version of the code at the time the software was setup.) If everything goes OK, you should see something like this

The working build directory is /uboone/app/users/tmw/dev/dl_brdev/build_slf6.x86_64
The source code directory is /uboone/app/users/tmw/dev/dl_brdev/srcs
----------- check this block for errors -----------------------
----------------------------------------------------------------

Now you're ready to go.

Create the files we need for our truth to image conversion

With the uboonecode environment built, we'll go walk through how to create a program to extract some information from larsoft data files and save it to an image. In particular, our example will be labeling pixels of each image as coming from one instance of a particle or another. We will also save the bounding boxes around those instances.

But first, it might be helpful to review this small set of slides. The slides attempt to provide a quick overview of LArSoft, LArCV and how the interface routines (Supera modules) are organized and operate.

We have provided two template files that one can use to build a super module

SuperaExampleTemplate.h.example
SuperaExampleTemplace.cxx.example

First thing is to make a copy of both of these files, replacing ExampleTemplate with some name that describes the output being made. We will change it to InstanceImage

cp SuperaExampleTemplate.h.template SuperaInstanceImage.h
cp SuperaExampleTemplate.cxx.template SuperaInstanceImage.cxx

Note, you need to change the header and source file themselves so that every instance of ExampleTemplate becomes InstanceImage.

Once you replace the name, this example should build. Because we created a new file, we should rerun cmake to rebuild the build folder:

mrb i -j4

This example takes in truth information from the LArSoft data product. The truth data products it uses is

MCTrack: a vector of MCTruth class instances where each represents the true trajectory history through the TPC active volume for "track-like" particles: muons, pions, protons, recoiling nuclei (and nuclear fragments)
MCShower: a vector of MCTruth class instances where each instance contains information on "shower-like" particles: electrons and showers produced by a photon. The exact trajectory history is not stored as that is not something that is defined. Instead, the shower start is stored and the initial direction.
SimCh: a vector of SimCh class instances which allows one to map charge seen at the wire at a given time with the particle that deposited the energy and the true position it was deposited.

Setup the configuration (fcl) files for the new module

Pick a folder to run the code -- it can be anywhere. For this tutorial, we'll make a folder in the LArCVImageMaker folder. In that folder create our working directory (and go into it)

mkdir tutorial
cd tutorial

First, we need to create a parameter block for the LArSoft module (LArSoftSuperaSriver_module) that will call our Supera module. We can use an existing fcl file as a template. Copy the following file into our working directory with

cp ../../../fcl/deeplearn/dlprod_fclbase_analyzers.fcl .

(From the root folder of the build environment this folder is srcs/uboonecode/fcl/deeplearn.)

Open it. You'll see a parameter block that is serving as a template for other variations

SuperaWholeView: {
  module_type:     "LArSoftSuperaSriver"
  supera_params:   "supera_segment.fcl"
  out_filename:    "larcv.root"
  unique_filename: false
  stream:          "mc"
}

Note that key parameter, module_type. This is how LArSoft knows what module this parameter block is for. Once we insert it into LArSoft's event loop (later in this tutorial), LArSoft knows to create an instance of this module using this parameter.

Now we will use this block to create an instance for our version of Supera. You'll see other instances in the file: SuperaFocusedView, SuperaBasic, SuperaMichelMC, etc.

Let's make one for ourselves

SuperaInstanceImage: @local::SuperaWholeView
SuperaInstanceImage.supera_params: "supera_instance_example.fcl"

What this is doing is creating a copy of the parameter block and then changing the supera parameter fcl file to our custom one, which we will create next. But first, add our parameter block to a parameter block, `dlprod_analyzers, that keeps a copy of all of our Supera module variations (along with some larsoft-to-larlite conversion modules). The block should look like this in the end

dlprod_analyzers:
{
  mcinfo:  @local::litemc_mcinfo
  simch:   @local::litemc_simch
  
  wire:    @local::litemc_wire
  opdigit: @local::litemc_opdigit
  opreco:  @local::litemc_opreco
  
  reco2d:  @local::litemc_reco2d
  
  superaWholeView:   @local::SuperaBasic
  superaWholeViewMC: @local::SuperaMCBasic
  superaFocusedView: @local::SuperaFocusedView
  superaFocusedViewPlus: @local::SuperaFocusedViewPlus
  superaFocusedView3D: @local::SuperaFocusedView3D
  superaMichelMC:    @local::SuperaMichelMC
  superaMichelData:  @local::SuperaMichelData
  superaCCPi0MC:     @local::SuperaCCPi0MC
  superaCCPi0Data:   @local::SuperaCCPi0Data
  superaInstanceImage: @local::SuperaInstanceImage
}

Now let's create the supera configuration file. Let's use a previous example:

cp ../supera_basic.fcl supera_instance_example.fcl

Open the file, you'll see the parameters, ProcessType and ProcessName

ProcessType: ["SuperaMetaMaker","SuperaWire","SuperaOpDigit","SuperaChStatus","WireMask"]
ProcessName: ["SuperaMetaMaker","SuperaWire","SuperaOpDigit","SuperaChStatus","WireMask"]

The first set of values are the Supera module types we are going to run. The second is the instance name assigned to it. (We often use the module name as the instance name as you can see.)

You'll also see ProcessList. This block contains the configuration files for the modules listed above. We use the instance name to associate a parameter block with the Supera module instance.

So we need to make a block for our new module, somewhere in the ProcessList block add

SuperaInstanceImage: {
  Verbosity: 0
  OutImageLabel: "instance"
  LArMCTruthProducer:  "generator"
  LArMCTrackProducer:  "mcreco"
  LArMCShowerProducer: "mcreco"
  LArSimChProducer:    "largeant"
  TimeOffset: 2400
  Origin: 0
}

and also add our module to the list of modules so that ProcessType and ProcessName look like

ProcessType: ["SuperaMetaMaker","SuperaWire","SuperaOpDigit","SuperaChStatus","WireMask","SuperaInstanceImage"]
ProcessName: ["SuperaMetaMaker","SuperaWire","SuperaOpDigit","SuperaChStatus","WireMask","SuperaInstanceImage"]

In SuperaInstanceImage, you might notice a bunch of parameters that we didn't request. For example, LArMCTruthProducer, LArMCTrackProducer, etc. These LAr[]Producer parameters tell the SuperaBase module (from which we inherited) which larsoft data products to use and their names.

Finally, we need to create a driver fcl file -- the config that pulls together all the different configuration blocks into one final parameter set that we can use to run the larsoft event loop program, lar.

Again, we'll use an existing example as a template. Copy one using

cp ../../../fcl/deeplearn/standard_supera_mc_noreco2d.fcl example_driver.fcl

Note how the top of this file has the line

#include "dlprod_fclbase_analyzers.fcl"

This command includes the fcl file we modified, which lists the different Supera configurations.

Advance note: we copied this from the uboone/fcl/deeplearn folder. Because the uboone/fcl folder is where the experiment keeps (for the most part) official fcl files, there will be a copy of dlprod_fclbase_analyzers.fcl installed in the job folder for the build, ${MRB_INSTALL}/uboonecode/v06_26_01_07/job/. Note that when the event loop tries to configure itself by reading in the driver fcl file, it will search for fcl files included using the #include command. Where does it look for these? It will use the environment variable FHICL_FILE_PATH. You can view this variable using

echo $FHICL_FILE_PATH

This variable holds a colon-separated list of directories where lar will look for fcl files. The search will happen first in the directory lar is called and then in order of the list. Once a file is found, the search for that file stops. All of this is just to tell you that we can override fcl files by modifying them in a work directory and then calling lar from that directory -- which is what we are doing.

We want to run our Supera, we do this by editing example_driver.fcl. Change the line

physics.larcv: [ opreco, mcinfo, superaWholeViewMC ]

physics.larcv: [ opreco, mcinfo, superaInstanceImage ]

Also, change the physics.end_paths line to

physics.end_paths: [larcv]

By default the file is setup to run not only our supera modules, but also anatree which converts all the data products into a root ntuple file and various other litemaker modules which convert larsoft data products into larlite modules. They are useful, but waste a lot of time in our context. But changing it to larcv, we only run our modules.

Now we should be good to go!

Run the example

Let's run the larsoft event loop lar.

lar -c example_driver.fcl -s /uboone/data/users/tmw/dl_test_files/prodgenie_bnb_nu_cosmic_uboone_0_20170324T023716_gen2_432fa840-cc4e-4be6-860f-94ffe0f451d3_20170809T145905_reco1_20170809T153341_reco2.root

Pro tip: don't type that out. Either copy and paste that file name or use tab-completion.

If everything work, you should see the following root files (when you run ls -lh *.root)

-rw-r--r-- 1 tmw microboone  434 Nov 13 16:27 ana_hist.root
-rw-r--r-- 1 tmw microboone  19M Nov 13 16:27 larcv.root
-rw-r--r-- 1 tmw microboone 3.8M Nov 13 16:27 larlite_mcinfo.root
-rw-r--r-- 1 tmw microboone 668K Nov 13 16:27 larlite_opreco.root

Note that ana_hist.root is empty since we removed the anatree module from physics.end_paths. Our image is in the larcv.root. We also make some larlite files that contain truth information and optical information from the PMT system. The opreco file also has information from the trigger, which is often vital in interpreting the data. For better or worse, we must keep our files bundled together.

Also, if you want to see the final configuration put together by the program when it runs, use the instructions here to dump out the config.

Visualize the image

If you built LArCV on your laptop and installed all the required software for the viewer, you can use the LArCV viewer to inspect the images. Copy the file to your laptop and then run

python $LARCV_BASEDIR/mac/view_rgb.py [path to larcv file]

Check the content

When developing truth labels, it is vital that one verifies the accuracy of the labels. The networks only learn what you teach them, so don't teach them garbage.

(coming soon, example checking script. But for now, example C++ code lives at this repo.)

(also coming soon, example python script to check presence of labels for pixels with above threshold charge.)

Run on the grid

Once you write your own custom Supera module and verified that it makes correct images, you will want to run a lot of data to make your training sample. This requires running on FermiGrid.

Follow this tutorial to see an example of launching grid jobs.