Simulation Examples - panoseti/PANOSETI-High-Energy-Analysis-Pipeline GitHub Wiki

Running the pipeline

Here we will walk through some simple examples of the full simulation/analysis pipeline for a small PANOSETI array. First we will start with the simple case of running a single instance of CORSIKA locally on your machine. Given PANOSETI's high energy threshold (~10s of TeV), high performance computing is a necessity to build up a sufficient data set. See more efficient ways of running CORSIKA on mulitple cores here or submitting jobs to a cluster here).

Steering File

If CORSIKA has already been installed and compiled, you just need to provide it a steering file to start running it. An example can be found in simulation-tools/example/example.inp. It should look something like this:

RUNNR 1
EVTNR 1
NSHOW 10
PRMPAR 1
ERANGE 3E4 1E5
ESLOPE 0
THETAP 0. 30.
PHIP 0. 360.
SEED 805 0 0
SEED 3234 0 0
SEED 305 0 0
SEED 2220 0 0
ATMOD 1
MAGNET 25.2 40.88
ARRANG 12.77
ELMFLG F T
RADNKG 200.E2
FIXCHI 0.
HADFLG 0 0 0 0 0 2
QGSJET T 0
QGSSIG T
HILOW 100.
ECUTS 0.30 0.05 0.02 0.02
MUADDI F
MUMULT T
LONGI T 20. F F
MAXPRT 50
PAROUT F F
ECTMAP 1.E6
DEBUG F 6 F 1000000
DIRECT ./
USER user
HOST host
ATMOSPHERE 61 T
TELFIL ./example.telescope
OBSLEV 1239.E2
CSCAT 1 100.E2 100.E2
CERFIL 0
CERSIZ 5.
CWAVLG 200. 700.
TELESCOPE 53.59E2 73.52E2 1E2 0.25E2
TELESCOPE 53.59E2 -80.48E2 1E2 0.25E2
TELESCOPE -107.18E2 6.95E2 1E2 0.25E2
EXIT

Descriptions of all these parameters can be found in the CORSIKA userguide.

For now, you might want to change the USER and HOST fields to refer to yourself and your machine. For example,

USER nkorzoun
HOST gamma22

Other options you may want to change according to personal preference are TELFIL and DIRECT which, respectively, tells CORSIKA what you would like to name your data file and where the data output should be directed to. For now I will assume that those two are unchanged from above.

Be mindful that NSHOW,ERANGE,CSCAT,and CERSIZE are the parameters that are going to most strongly affect how long it takes to run a simulation. It is recommended to keep CERSIZE, the photon bunch size, low (between 5 and 10).

This example will use the triangle configuration at Lick Observatory:

OBSLEV 1239.E2
TELESCOPE 53.59E2 73.52E2 1E2 0.25E2
TELESCOPE 53.59E2 -80.48E2 1E2 0.25E2
TELESCOPE -107.18E2 6.95E2 1E2 0.25E2

For the sake of simplicity, here the telescopes are all placed at the same elevation: 1 meter above the observation level, 1239m a.s.l. Be mindful that Cherenkov photons that are generated below OBSLEV are deleted and the data is only kept for telescopes at the lowest observation level. So don't make telescopes with a negative z-coordinate, and never include more than one instance of OBSLEV.

The atmospheric model being used, ATMOSPHERE 61 T, is the VERITAS winter atmosphere, available here. Place this in the CORSIKA run directory, i.e. /path/to/corsika/install/run

Executing CORSIKA

If you put the example steering file, example.inp, in the CORSIKA run directory, then you can execute CORSIKA with just

./corsika77410Linux_QGSII_urqmd <example.inp

or to send stdout to a logfile,

./corsika77410Linux_QGSII_urqmd <example.inp > logfile.log

On my machine this takes ~90 minutes to complete. Your CORSIKA command will look different if you have installed a different version or are using different hadronic interaction models. When it finishes, CORISKA will create example.telescope in the same directory. This file has the Cherenkov photon information that is fed into corsikaIOreader.

If you try to run the same command again, CORSIKA will complain that it doesn't have write access to example.telescope. You will have to either remove example.telescope or change the TELFIL field in the steering file.

corsikaIOreader

Move example.telescope into the directory where corsikaIOreader is installed. Execute corsikaIOreader with

corsikaIOreader -cors example.telescope -histo example.root -abs CORSIKA

This should take less than a minute to run with only 10 showers. It can take 10s of minutes or even an hour for thousands of showers.

panodisplay

Navigate to the simulation-tools/ directory and open ROOT.

root -l

Then load in the panodisplay.C macro

.L panodisplay.C

and read in the corsikaIOreader root file. So long as example.root and panodisplay.C are in the same directory, this can be done with

readFile("example.root")

In this example root file, there should be 10 events. Type

panodisplay(8)

to look at event 8. Two windows should open that look like this: example

In one window is the image as seen by each telescope in the camera plane. In the other window is a map of the event in the ground plane. The telescope position are marked by a black circle and are labeled T1, T2, T3. The origin (0,0) corresponds roughly to the centroid of the triangle configuration with North = (+y) and East = (+x), differing from the CORSIKA definition of the coordinate system. A good explanation of coordinate systems can be found here, and a diagram of the CORSIKA coordinates can be found in the CORSIKA userguide on page 132. In the camera images, if Hillas parameterization is successful, it is represented by a black ellipse. In the event map, the simulated core position is marked by a red 'x', and if core reconstruction is successful, it is marked by a blue '+'. The ROOT terminal will also print some basic information about the event. First are some image moments and Hillas parameters for each telescope, and then are some parameters of the simulated shower. The shower information can also be obtained with

cout<<showerInfo(8)

So typing

for(int i=0;i<10;i++){
    cout<<showerInfo(i);
}

in ROOT will print the shower parameters for all 10 events.

If you type

panodisplay(8)

a second time, the image will probably look slightly different. This is a consequence of the randomness introduced by simulating NSB and the SiPM detector response. The random number generator is seeded however, so quitting ROOT and typing the same sequence of commands should yield identical results.

Typing

telEvent(1,8)

will return the ROOT histogram for just telescope 1 in event 8. So typing

telEvent(1,8)->Draw("COLZ0")

will draw the image (without parameterization).

Finally, typing

paramCSV(true)

will attempt to parameterize every event in the root file and write them to example.csv. This is useful for transferring the analysis to a much friendlier programming language, like python :P

The boolean parameter indicates whether or not you want reconstructed values included in the csv (namely arrival direction and impact parameter).

Running the pipeline on multiple cores

CORSIKA unfortunately cannot run in parallel with the IACT option enabled, but we can simulate many showers simultaneously in pseudo-parallel with some clever scripting. We will run multiple instances of the CORSIKA-corsikaIOreader pipeline on multiple cores, then combine the .root files after.

corsikaBatchGenerator.ipynb

Running multiple instances of CORSIKA means we will need to create multiple steering files. At the very least, this will include changing the SEED and TELFIL fields in the steering files. For more than a few instances, this is very tedious. corsikaBatchGenerator.ipynb is a python notebook that can quickly facilitate this task.

The code first starts with a definition of a template steering file very similar to above. The next few code blocks will generate a modified steering file when fed a structure I call a controller. The controller is essentially a list of lists with the following format:

controller = [
     [keyword1, [arg1_val1, arg1_val2, ...], [arg2_val1, arg2_val2, ...], ...],
     [keyword2, [arg1_val1, arg1_val2, ...], ...],
     [keyword3, [arg1_val1, arg1_val2, ...], [arg2_val1, arg2_val2, ...], [arg3_val1, arg3_val2, ...], ...]
]

keyword refers to any field in the CORSIKA steering file. Valid keywords are defined in one of the helper functions in the notebook. If a particular keyword is not already defined, it can easily be added. When the controller receives a keyword, it is an indication to the generator that a field in the steering file is going to be iterated. Some keywords like PRMPAR only take one argument. This would correspond to [arg1_val1, arg1_val2, ...] So if we wanted to make two identical CORSIKA steering files, but change the primary particle of the second file, the controller would look like

controller = [
    ['PRMPAR', ['1', '13']]
]

In the first file, PRMPAR will be set to 1 (gamma) and in the second file, PRMPAR will be set to 13 (proton). The number of values provided in the list ['1','13'] already tells the generator how many files are going to be made. Also note that everything is surrounded by single quotes, as everything in the lists must be a string.

The SEED field takes multiple arguments, so a controller changing the seeds will look more complicated to reflect the need to change each of the values of each argument for each steering file we generate.

controller = [
   ['ERANGE',
       ['1E4','1E5','1E6'],
       ['1E4','1E5','1E6']],
   ['CSCAT', 
       ['1','1','1'], 
       ['500.E2','850.E2','1200.E2'], 
       ['500.E2','850.E2','1200.E2']],
   ['SEED1',
       ['200','208','216'],
       ['0','0','0'],
       ['0','0','0']],
   ['SEED2',
       ['202','210','218'],
       ['0','0','0'],
       ['0','0','0']],
   ['SEED3',
       ['204','212','220'],
       ['0','0','0'],
       ['0','0','0']],
   ['SEED4',
       ['206','214','222'],
       ['0','0','0'],
       ['0','0','0']]
]

This controller generates three steering files, each with differing simulated energy, simulation area (where the shower core is scattered in), and random seeds. A controller like this might be useful for an effective area study where you want to simulate larger scattering areas for high energy showers, but not necessarily for the low energy showers. This prevents biasing your triggers on high energy events at small impact parameters and wasting time simulating low energy showers that never trigger at large impact parameters.

Even with just a few keywords and a small number of steering files, manually typing out all of these values can still be tedious and ultimately doesn't save much time. This is where typing out something more pythonic will reduce time spent setting up the sims.

import numpy as np

nFiles=10 #number of input files
nShow=1 #number of showers per run


seed1=['{}'.format(np.random.randint(0,9999)) for i in range(nFiles)]
seed2=['{}'.format(np.random.randint(0,9999)) for i in range(nFiles)]
seed3=['{}'.format(np.random.randint(0,9999)) for i in range(nFiles)]
seed4=['{}'.format(np.random.randint(0,9999)) for i in range(nFiles)]

controller = [
    ['EVTNR',[str(x) for x in range(1,(nFiles*nShow)+1,nShow)]],
    ['SEED1',seed1,[str(0) for x in range(nFiles)],[str(0) for x in range(nFiles)]],
    ['SEED2',seed2,[str(0) for x in range(nFiles)],[str(0) for x in range(nFiles)]],
    ['SEED3',seed3,[str(0) for x in range(nFiles)],[str(0) for x in range(nFiles)]],
    ['SEED4',seed4,[str(0) for x in range(nFiles)],[str(0) for x in range(nFiles)]]
]

A controller like this one is far easier to generate hundreds or even thousands of steering files. This particular controller will make 10 steering files, each simulating a single shower. When CORSIKA finishes, we can run several instances of corsikaIOreader on different cores to create 10 root files, and then combine the root files into a single data file with something like hadd -f my.root *.root

The steering files are generated at the bottom of the notebook when genRuns(1, controller) is called. The steering files are named batch1.inp, batch2.inp, ... by default. Calling genRuns(10, controller), for example, will start the input file indexing at 10 instead of 1. The .telescope data file created by CORSIKA is also automatically incremented by this function, assuming your template file is still named like TELFIL ./DATbatch0.telescope.

Executing multiple instances of CORSIKA and corsikaIOreader

Multiple instances of CORSIKA can be run just by executing the command multiple times

./corsika77410Linux_QGSII_urqmd <batch1.inp > batch1.log &
./corsika77410Linux_QGSII_urqmd <batch2.inp > batch2.log &
./corsika77410Linux_QGSII_urqmd <batch3.inp > batch3.log &
...

so long as your machine has enough cores to accommodate however many calls. The tailing ampersand send the job to the background so the terminal isn't locked up for the duration of the run. Saves from having to open many terminal windows. Something similar can be done with corsikaIOreader.

GNU parallel

An easier way to facilitate this is with a couple of bash scripts and by utilizing a tool called GNU parallel. Included in the simulation-tools/ folder are two such scripts, corsika.sh and runbatch.sh.

At the top of these scripts, you will need to set some variables to locate various directories. These are your CORSIKA run/ directory, the directory where corsikaIOreader is installed in, and the directory you would like to place the output files in. For example,

RUNDIR=/home/nkorzoun/Software/corsika/corsika-77410/run
IODIR=/home/nkorzoun/Software/corsikaIOreader
OUTDIR=/home/nkorzoun/research/panoseti/corsika/batchrun

To quickly test that these scripts work, you can change the ERANGE field in the template to something small, like

ERANGE 1E3 1E3

then run the notebook to generate the batch.inp files, then execute

./runbatch

The scripts assume they are in the same directory as the steering files. Old files in the same directory (except the .inp files) are removed at the start of executing runbatch.sh. When completed, the scripts will leave the .telescope files in the CORSIKA run directory and the .root files are left in the corsikaIOreader directory, but are copied over to OUTDIR.

In corsika.sh is the line

parallel --verbose --jobs 16 --joblog joblog.txt ./corsika.sh {} {/.} ::: $THISDIR/batch*.inp

This will try to queue up all of the CORSIKA steering files you generated (as many as you want!) and split them on 16 different cores. You will want to adjust the --jobs flag depending on how many cores you want to be used up on your machine. They are not necessarily queued up in numeric order (a quirk of GNU parallel), which you can verify by looking at joblog.txt when the jobs complete. I usually just monitor their progress by keeping a terminal open with top, or even better, htop.

In the end you will have many root files which is inconvenient for the purposes of running panodisplay.C (but great for executing the paramCSV function in parallel, see the next section). To combine the root files, simply type something like this

hadd -f _merged.root *.root

The underscore is nice for listing the merged file first in a directory filled with hundreds of smaller root files.

Running the pipeline on a cluster with a job scheduler (e.g. with SLURM)

The above method is nice for having a few hundred showers to develop analysis tools for, but for something statistically significant we will need to run thousands or tens of thousands of showers at minimum. This can take weeks or months to run on your everyday work computer. In another repository is a compilation of scripts I use to submit jobs on a cluster that uses SLURM as its job scheduler. The workflow is very similar to the above, but modified to work on a shared computing cluster. Run these scripts at your own risk; I'm a novice at high performance computing and I have gotten emails from system administrators asking about jobs that stalled due to unsatisfied requirements from CORSIKA sims failing and disrupting the pipeline. These scripts could still be considered a work in progress, but for now they work well for my purposes with a small amount of supervision.

corsikaBatchGenerator.py is very similar to corsikaBatchGenerator.ipynb, but is written to be more easily called from a terminal. It also allows for the passing of arguments to take the place of editing the controller in the python notebook. Once the steering files are generated, the jobs are submitted in parallel and when CORSIKA completes, corsikaIOreader is also used in parallel to make the root files. Another script is called to run the paramCSV function in panodisplay.C, also in parallel. This generates as many .csv files as there are .root files, that are later stitched together like the root files. The files are then compressed for copying off of the cluster and onto another computer for further analysis. All data is kept in the zip files which includes the original steering files, the .telescope files, all .root files, all .csv files and the final merged products.