How to create a ground truth image from LArSoft UBooNECode data - twongjirad/LArLiteSoftCookBook GitHub Wiki
Quick expert summary on how to do this
- Check out a copy of uboonecode using v06_26_01_09 and get the DL_br branch
- In
uboonecode/LArCVImageMaker
create a Supera process module. Use one of the existing ones as a template -- but, basically, one is gathering truth larsoft data products and passing it to a function to create your ground truth labels - The module definition needs to include a factory instance for the module as well, so that the manager module,
LArSoftSuperaSriver
can call up the new Supera module and run it - The
LArSoftSuperaSriver
is configured using a supera fcl file, which allows you to run a series of supera modules. The modules can reference the produces of upstream supera modules in case that is helpful in generating your ground truth. - after your write your module, you need to setup the fcl files to run it
- to do so, add a configuration for it in dlprod_fclbase_analyzer.fcl by creating a copy of SuperaWholeView and then set the configuration super fcl.
- add the configuration to the set of dl analyzers:
dlprod_analyzers
- then you can call this analyzer in your driver fcl file. An example to modify is in
uboonecode/fcl/deeplean/standard_larcv_uboone.fcl
Tutorial
Background notes
This work will be done on the Fermilab machines, so you will need an account there.
The goal of this tutorial is to walk you through the task of generating truth information in the form of a labeled image. The task will be to provide labels that provide us a way of determining distinct particles in an image. This is a specific task so that the tutorial can be concrete. Of course, your truth labels and tasks will almost certainly be different.
Truth information includes (but is not limited to) the following: 1) pixel-wise labels in an image, 2) a list of labels fora whole image and 3) bounding boxes indicating the location of an object in an image. The LArCV framework provides a means of storing all three types of truth information within a ROOT file. It can, of course, storing images -- be it images of truth labels or an image of the TPC data. It does so with the Image2D
class. LArCV also allows one to store a set of labels and bounding boxes using the ROI
class. Each ROI
instance can store a bounding box, i.e. (x,y) position along with height and width AND a label. You can choose to use it store one type of information or another or both.
For example, if you want to label an entire image as either neutrino or cosmic, you would save an ROI object with either a neutrino or cosmic label for each image. You would then leave the bounding box info unused. However, if you wanted to save the bounding box positions of individual particles coming from a neutrino interaction, you would save the box information and particle ID of each particle using an ROI
instance for each particle.
The task described in this tutorial will be on how to fill an image with truth information.
Setup your larsoft environment
When starting out, you will need to get a copy of larsoft that contains the image conversion tools required and then build that code. You will then modify this copy and rebuild when you implement your specific code for generating your ground truth image.
When coming back to the task, you do not need to re-download the code. But each time you log into the Fermilab machines, you will need to setup your code's environment variables again.
The general description for both of these tasks is given here.
Steps (checking out a fresh copy)
First setup access to the uboonecode software
source /grid/fermiapp/products/uboone/setup_uboone.sh
Next, activate a specific version of uboonecode (warning this changes over time)
setup uboonecode v06_26_01_09 -q e10:prof
When you do this, environment variables setting up the different pieces of larsoft necessary for this version of uboonecode will have been defined. For example, you can run echo $UBOONECODE_DIR
to see where this copy of uboonecode lives.
However, we don't want to use the existing copy, we want to be able to get our version of the source, build it, and run our copy of uboonecode instead. To do this, first make a directory where our copy will live (you can choose whatever directory name you want, but I've made a concrete choice of dl_brdev
here.
mkdir dl_brdev
Note, on the uboone machines, you should make this directory somewhere on the uboone app folder
/uboone/app/users/[your username]
(if this location doesn't exist, you can make one.)
Now we will use mrb
, a tool that helps us to manage our larsoft development environment, to start a new development space. Go into the dl_brdev
folder and run
cd dl_brdev
mrb newDev
You'll see the something similar:
building development area for larsoft v06_26_01_08 -q e10:prof
MRB_BUILDDIR is /uboone/app/users/tmw/dev/dl_brdev/build_slf6.x86_64
MRB_SOURCE is /uboone/app/users/tmw/dev/dl_brdev/srcs
INFO: copying /cvmfs/fermilab.opensciencegrid.org/products/larsoft/larsoft/v06_26_01_08/releaseDB/base_dependency_database
IMPORTANT: You must type
source /uboone/app/users/tmw/dev/dl_brdev/localProducts_larsoft_v06_26_01_08_e10_prof/setup
NOW and whenever you log in
We follow the advice
source localProducts_larsoft_v06_26_01_08_e10_prof/setup
Now we have this development environment setup. But to develop, we need source code. We go into the sources folder, srcs
and type
cd srcs
mrb g uboonecode
We have to checkout a specific branch of the uboonecode repo. Go to the uboonecode directory
cd ../uboonecode/
git checkout DL_br
You should see this
Branch DL_br set up to track remote branch DL_br from origin.
Switched to a new branch 'DL_br'
Now that we have copies of the source, let's finish setting up the mrb build environment with
mrbsetenv
Now we can start the build
mrb i -j4
You can go get a snack or coffee at this point. This will take 5-10 minutes.
Setting up Steps (re-activate an existing copy)
First, setup the cvmfs software
source /grid/fermiapp/products/uboone/setup_uboone.sh
Go to your copy of uboonecode. Following the example above, this would be something like
/uboone/app/users/[your username]/dl_brdev
Reactive your build environment
source localProducts_larsoft_v06_26_01_08_e10_prof/setup
mrbsetenv
(Note that localProducts_larsoft_v06_26_01_08_e10_prof
might be different. It depends on the current version of the code at the time the software was setup.) If everything goes OK, you should see something like this
The working build directory is /uboone/app/users/tmw/dev/dl_brdev/build_slf6.x86_64
The source code directory is /uboone/app/users/tmw/dev/dl_brdev/srcs
----------- check this block for errors -----------------------
----------------------------------------------------------------
Now you're ready to go.
Create the files we need for our truth to image conversion
With the uboonecode environment built, we'll go walk through how to create a program to extract some information from larsoft data files and save it to an image. In particular, our example will be labeling pixels of each image as coming from one instance of a particle or another. We will also save the bounding boxes around those instances.
But first, it might be helpful to review this small set of slides. The slides attempt to provide a quick overview of LArSoft, LArCV and how the interface routines (Supera modules) are organized and operate.
We have provided two template files that one can use to build a super module
- SuperaExampleTemplate.h.example
- SuperaExampleTemplace.cxx.example
First thing is to make a copy of both of these files, replacing ExampleTemplate
with some name that describes the output being made. We will change it to InstanceImage
cp SuperaExampleTemplate.h.template SuperaInstanceImage.h
cp SuperaExampleTemplate.cxx.template SuperaInstanceImage.cxx
Note, you need to change the header and source file themselves so that every instance of ExampleTemplate
becomes InstanceImage
.
Once you replace the name, this example should build. Because we created a new file, we should rerun cmake to rebuild the build folder:
mrb i -j4
This example takes in truth information from the LArSoft data product. The truth data products it uses is
- MCTrack: a vector of MCTruth class instances where each represents the true trajectory history through the TPC active volume for "track-like" particles: muons, pions, protons, recoiling nuclei (and nuclear fragments)
- MCShower: a vector of MCTruth class instances where each instance contains information on "shower-like" particles: electrons and showers produced by a photon. The exact trajectory history is not stored as that is not something that is defined. Instead, the shower start is stored and the initial direction.
- SimCh: a vector of SimCh class instances which allows one to map charge seen at the wire at a given time with the particle that deposited the energy and the true position it was deposited.
Setup the configuration (fcl) files for the new module
Pick a folder to run the code -- it can be anywhere. For this tutorial, we'll make a folder in the LArCVImageMaker
folder. In that folder create our working directory (and go into it)
mkdir tutorial
cd tutorial
First, we need to create a parameter block for the LArSoft module (LArSoftSuperaSriver_module) that will call our Supera module. We can use an existing fcl file as a template. Copy the following file into our working directory with
cp ../../../fcl/deeplearn/dlprod_fclbase_analyzers.fcl .
(From the root folder of the build environment this folder is srcs/uboonecode/fcl/deeplearn
.)
Open it. You'll see a parameter block that is serving as a template for other variations
SuperaWholeView: {
module_type: "LArSoftSuperaSriver"
supera_params: "supera_segment.fcl"
out_filename: "larcv.root"
unique_filename: false
stream: "mc"
}
Note that key parameter, module_type
. This is how LArSoft knows what module this parameter block is for. Once we insert it into LArSoft's event loop (later in this tutorial), LArSoft knows to create an instance of this module using this parameter.
Now we will use this block to create an instance for our version of Supera. You'll see other instances in the file: SuperaFocusedView
, SuperaBasic
, SuperaMichelMC
, etc.
Let's make one for ourselves
SuperaInstanceImage: @local::SuperaWholeView
SuperaInstanceImage.supera_params: "supera_instance_example.fcl"
What this is doing is creating a copy of the parameter block and then changing the supera parameter fcl file to our custom one, which we will create next. But first, add our parameter block to a parameter block, `dlprod_analyzers, that keeps a copy of all of our Supera module variations (along with some larsoft-to-larlite conversion modules). The block should look like this in the end
dlprod_analyzers:
{
mcinfo: @local::litemc_mcinfo
simch: @local::litemc_simch
wire: @local::litemc_wire
opdigit: @local::litemc_opdigit
opreco: @local::litemc_opreco
reco2d: @local::litemc_reco2d
superaWholeView: @local::SuperaBasic
superaWholeViewMC: @local::SuperaMCBasic
superaFocusedView: @local::SuperaFocusedView
superaFocusedViewPlus: @local::SuperaFocusedViewPlus
superaFocusedView3D: @local::SuperaFocusedView3D
superaMichelMC: @local::SuperaMichelMC
superaMichelData: @local::SuperaMichelData
superaCCPi0MC: @local::SuperaCCPi0MC
superaCCPi0Data: @local::SuperaCCPi0Data
superaInstanceImage: @local::SuperaInstanceImage
}
Now let's create the supera configuration file. Let's use a previous example:
cp ../supera_basic.fcl supera_instance_example.fcl
Open the file, you'll see the parameters, ProcessType
and ProcessName
ProcessType: ["SuperaMetaMaker","SuperaWire","SuperaOpDigit","SuperaChStatus","WireMask"]
ProcessName: ["SuperaMetaMaker","SuperaWire","SuperaOpDigit","SuperaChStatus","WireMask"]
The first set of values are the Supera module types we are going to run. The second is the instance name assigned to it. (We often use the module name as the instance name as you can see.)
You'll also see ProcessList
. This block contains the configuration files for the modules listed above. We use the instance name to associate a parameter block with the Supera module instance.
So we need to make a block for our new module, somewhere in the ProcessList
block add
SuperaInstanceImage: {
Verbosity: 0
OutImageLabel: "instance"
LArMCTruthProducer: "generator"
LArMCTrackProducer: "mcreco"
LArMCShowerProducer: "mcreco"
LArSimChProducer: "largeant"
TimeOffset: 2400
Origin: 0
}
and also add our module to the list of modules so that ProcessType
and ProcessName
look like
ProcessType: ["SuperaMetaMaker","SuperaWire","SuperaOpDigit","SuperaChStatus","WireMask","SuperaInstanceImage"]
ProcessName: ["SuperaMetaMaker","SuperaWire","SuperaOpDigit","SuperaChStatus","WireMask","SuperaInstanceImage"]
In SuperaInstanceImage
, you might notice a bunch of parameters that we didn't request. For example, LArMCTruthProducer
, LArMCTrackProducer
, etc. These LAr[]Producer
parameters tell the SuperaBase
module (from which we inherited) which larsoft data products to use and their names.
Finally, we need to create a driver fcl file -- the config that pulls together all the different configuration blocks into one final parameter set that we can use to run the larsoft event loop program, lar
.
Again, we'll use an existing example as a template. Copy one using
cp ../../../fcl/deeplearn/standard_supera_mc_noreco2d.fcl example_driver.fcl
Note how the top of this file has the line
#include "dlprod_fclbase_analyzers.fcl"
This command includes the fcl file we modified, which lists the different Supera configurations.
Advance note: we copied this from the uboone/fcl/deeplearn
folder. Because the uboone/fcl
folder is where the experiment keeps (for the most part) official
fcl files, there will be a copy of dlprod_fclbase_analyzers.fcl
installed in the job
folder for the build, ${MRB_INSTALL}/uboonecode/v06_26_01_07/job/
. Note that when the event loop tries to configure itself by reading in the driver fcl file, it will search for fcl files included using the #include
command. Where does it look for these? It will use the environment variable FHICL_FILE_PATH
. You can view this variable using
echo $FHICL_FILE_PATH
This variable holds a colon-separated list of directories where lar
will look for fcl files. The search will happen first in the directory lar
is called and then in order of the list. Once a file is found, the search for that file stops. All of this is just to tell you that we can override fcl files by modifying them in a work directory and then calling lar
from that directory -- which is what we are doing.
We want to run our Supera, we do this by editing example_driver.fcl
. Change the line
physics.larcv: [ opreco, mcinfo, superaWholeViewMC ]
to
physics.larcv: [ opreco, mcinfo, superaInstanceImage ]
Also, change the physics.end_paths
line to
physics.end_paths: [larcv]
By default the file is setup to run not only our supera modules, but also anatree
which converts all the data products into a root ntuple file and various other litemaker
modules which convert larsoft data products into larlite
modules. They are useful, but waste a lot of time in our context. But changing it to larcv, we only run our modules.
Now we should be good to go!
Run the example
Let's run the larsoft event loop lar
.
lar -c example_driver.fcl -s /uboone/data/users/tmw/dl_test_files/prodgenie_bnb_nu_cosmic_uboone_0_20170324T023716_gen2_432fa840-cc4e-4be6-860f-94ffe0f451d3_20170809T145905_reco1_20170809T153341_reco2.root
Pro tip: don't type that out. Either copy and paste that file name or use tab-completion.
If everything work, you should see the following root files (when you run ls -lh *.root
)
-rw-r--r-- 1 tmw microboone 434 Nov 13 16:27 ana_hist.root
-rw-r--r-- 1 tmw microboone 19M Nov 13 16:27 larcv.root
-rw-r--r-- 1 tmw microboone 3.8M Nov 13 16:27 larlite_mcinfo.root
-rw-r--r-- 1 tmw microboone 668K Nov 13 16:27 larlite_opreco.root
Note that ana_hist.root
is empty since we removed the anatree
module from physics.end_paths
. Our image is in the larcv.root
. We also make some larlite files that contain truth information and optical information from the PMT system. The opreco file also has information from the trigger, which is often vital in interpreting the data. For better or worse, we must keep our files bundled together.
Also, if you want to see the final configuration put together by the program when it runs, use the instructions here to dump out the config.
Visualize the image
If you built LArCV on your laptop and installed all the required software for the viewer, you can use the LArCV viewer to inspect the images. Copy the file to your laptop and then run
python $LARCV_BASEDIR/mac/view_rgb.py [path to larcv file]
Check the content
When developing truth labels, it is vital that one verifies the accuracy of the labels. The networks only learn what you teach them, so don't teach them garbage.
(coming soon, example checking script. But for now, example C++ code lives at this repo.)
(also coming soon, example python script to check presence of labels for pixels with above threshold charge.)
Run on the grid
Once you write your own custom Supera module and verified that it makes correct images, you will want to run a lot of data to make your training sample. This requires running on FermiGrid.
Follow this tutorial to see an example of launching grid jobs.