Setting the production environment - PADME-Experiment/padme-prod GitHub Wiki

These are the steps set up the PADME production environment for both Reconstruction and MC production.

Database access

The production system is based on a MySQL database which is currently hosted by the central DB service at the INFN Frascati National Laboratory.

All production software accesses this DB using environment variables which must be set as follows:

export PADME_MCDB_HOST=percona.lnf.infn.it
export PADME_MCDB_PORT=3306
export PADME_MCDB_USER=padmeMCDB
export PADME_MCDB_PASSWD=<password>
export PADME_MCDB_NAME=PadmeMCDB

Ask for the DB password to the PADME Offline Software Manager or to the LNF Computing Services.

If you are setting up a system at a different site, make sure a reliable MySQL server is available and create there the PadmeMCDB database using the schema found in padme-prod/PadmeProd/db/PadmeMCDB_schema.sql (see below for padme-prod software retrieval). Then change the environment variables above to the new DB access information.

Production software setup

Choose the production top directory (any name will do) and download there the padme-prod software. Also create there a subdirectory named prod to host all log files. It is useful to create here links to the main production scripts.

[leonardi@padmeui ~]$ mkdir production
[leonardi@padmeui ~]$ cd production
[leonardi@padmeui production]$ git clone https://github.com/PADME-Experiment/padme-prod
[leonardi@padmeui production]$ mkdir prod
[leonardi@padmeui production]$ ln -s padme-prod/PadmeProd/code/PadmeMCProd.py PadmeMCProd
[leonardi@padmeui production]$ ln -s padme-prod/PadmeProd/code/PadmeRecoProd.py PadmeRecoProd
[leonardi@padmeui production]$ ln -s padme-prod/PadmeProd/code/PadmeRecoSubmit.py PadmeRecoSubmit

PadmeMCProd

The PadmeMCProd script handles productions of simulated events using the PadmeMC software which must be installed on CVMFS. See Installing PadmeMC on CVMFS for instructions on how to do this.

The syntax to use this script is:

PadmeMCProd -n <prod_name> -j <number_of_jobs> [-v <padmemc_version>] [-c <config_file>] [-s <submission_site>] [-C <CE_node> [-P <CE_port>] -Q <CE_queue>] [-d <storage_site>] [-D <description>] [-U <user>] [-N <events>] [-h]
  -n <prod_name>        Name for the production
  -j <number_of_jobs>   Number of production jobs to submit. Must be >0 and <=1000
  -v <version>          Version of PadmeMC to use for production. Must be installed on CVMFS. Default: develop
  -c <config_file>      Configuration file to use. Default: cfg/<prod_name>.cfg
  -s <submission_site>  Site to be used for job submission. Allowed: LNF,SOFIA,CNAF. Default: LNF
  -C <CE_node>          CE node to be used for job submission. If defined, <submission_site> will not be used
  -P <CE_port>          CE port. Default: 8443
  -Q <CE_queue>         CE queue to use for submission. This parameter is mandatory if -C is specified
  -d <storage_site>     Site where the jobs output will be stored. Allowed: LNF,CNAF. Default: CNAF
  -p <proxy>            Long lived proxy file to use for this production. If not defined it will be created.
  -D <description>      Production description (to be stored in the DB). 'TEST' if not given.
  -U <user>             Name of user who requested the production (to be stored in the DB). 'Unknown' if not given.
  -N <events>           Total number of events requested by user (to be stored in the DB). 0 if not given.
  -V                    Enable debug mode. Can be repeated to increase verbosity

PadmeRecoProd

The PadmeRecoProd script handles the reconstruction of all rawdata from a single run of the PADME experiment using the PadmeReco software which must be installed on CVMFS. See Installing PadmeReco on CVMFS for instructions on how to do this.

The syntax to use this script is:

PadmeRecoProd -r <run_name> [-y <year>] [-j <files_per_job>] [-v <version>] [-n <prod_name>] [-s <submission_site>] [-C <CE_node> [-P <CE_port>] -Q <CE_queue>] [-d <storage_site>] [-p <proxy>] [-D <description>] [-V] [-h]
  -r <run_name>         name of the run to process
  -y <year>             year of run. N.B. used only if run name is not self-documenting
  -v <version>          version of PadmeReco to use for production. Must be installed on CVMFS. Default: develop
  -n <prod_name>        name for the production. Default: <run_name>_<version>
  -j <files_per_job>    number of rawdata files to be reconstructed by each job. Default: 100
  -s <submission_site>  site to be used for job submission. Allowed: LNF,SOFIA,CNAF. Default: LNF
  -S <source_uri>       URI to use to get list of files to process
  -C <CE_node>          CE node to be used for job submission. If defined, <submission_site> will not be used
  -P <CE_port>          CE port. Default: 8443
  -Q <CE_queue>         CE queue to use for submission. This parameter is mandatory if -C is specified
  -d <storage_site>     site where the jobs output will be stored. Allowed: LNF,CNAF. Default: LNF
  -p <proxy>            Long lived proxy file to use for this production. If not defined it will be created.
  -D <description>      Production description (to be stored in the DB). 'TEST' if not given.
  -V                    enable debug mode. Can be repeated to increase verbosity

PadmeRecoSubmit

The PadmeRecoSubmit script is used to start the reconstruction of multiple runs with a single command.

The syntax to use this script is:

PadmeRecoSubmit [-L <run_list_file>] [-r <run>] [-j <files_per_job>] [-v <version>] [-s <submission_site>] [-Q <CE_queue>] [-P <CE_port>] [-S <source_uri>] [-d <storage_site>] [-D <submit_delay>] [-V] [-h]
  -L <run_list_file>    file with list of runs to process
  -r <run_name>         name of run to process
  -v <version>          version of PadmeReco to use for production. Must be installed on CVMFS. Default: develop
  -j <files_per_job>    number of rawdata files to be reconstructed by each job. Default: 100
  -s <submission_site>  site to be used for job submission. Allowed: LNF,SOFIA,CNAF. Default: LNF
  -P <CE_port>          CE port. Default: 8443
  -Q <CE_queue>         CE queue to use for submission. Default from submission site
  -S <source_uri>       URI to use to get list of files for production run
  -d <storage_site>     site where the jobs output will be stored. Allowed: LNF,CNAF. Default: LNF
  -D <submit_delay>     Delay in sec between run submissions. Default: 60 sec
  -V                    enable debug mode. Can be repeated to increase verbosity
  N.B. Multiple -L and -r options can be combined to create a single list of runs. Duplicated runs will be automatically removed.

Additional tools

Several additional scripts to obtain information about the current state of the system are available in the padme-prod/tools directory. These tools are specifically configured for the LNF Tier2 and need modifications to work at a different site.

show_mcprod.sh,show_recoprod.sh,show_prod.sh,show_jobs.sh,show_submit.sh,show_file.sh show nicely formatted views of the database content. If no argument is specified, the commands show a full view of the database. If the user specifies the name of a production, only information related to that production is shown. E.g.

[leonardi@padmeui prod]$ tools/show_jobs.sh run_0000000_20190211_080340_test20190910
+------------------------------------------+----------+------------+----------+----------+---------------------+---------------------+
| production                               | job      | job status | events   | files    | time created        | time completed      |
+------------------------------------------+----------+------------+----------+----------+---------------------+---------------------+
| run_0000000_20190211_080340_test20190910 | job00000 | 2 Success  |  100,000 |        1 | 2019-11-07 08:24:24 | 2019-11-07 19:59:04 |
| run_0000000_20190211_080340_test20190910 | job00001 | 2 Success  |  100,000 |        1 | 2019-11-07 08:24:24 | 2019-11-07 23:12:01 |
| run_0000000_20190211_080340_test20190910 | job00002 | 2 Success  |  100,000 |        1 | 2019-11-07 08:24:24 | 2019-11-07 22:54:02 |
| run_0000000_20190211_080340_test20190910 | job00003 | 2 Success  |  100,000 |        1 | 2019-11-07 08:24:24 | 2019-11-07 22:50:28 |
| run_0000000_20190211_080340_test20190910 | job00004 | 1 Active   | NULL     | NULL     | 2019-11-07 08:24:24 | NULL                |
| run_0000000_20190211_080340_test20190910 | job00005 | 1 Active   | NULL     | NULL     | 2019-11-07 08:24:24 | NULL                |
| run_0000000_20190211_080340_test20190910 | job00006 | 2 Success  |  100,000 |        1 | 2019-11-07 08:24:24 | 2019-11-07 23:21:55 |
| run_0000000_20190211_080340_test20190910 | job00007 | 2 Success  |  100,000 |        1 | 2019-11-07 08:24:24 | 2019-11-07 23:47:18 |
| run_0000000_20190211_080340_test20190910 | job00008 | 1 Active   | NULL     | NULL     | 2019-11-07 08:24:24 | NULL                |
| run_0000000_20190211_080340_test20190910 | job00009 | 1 Active   | NULL     | NULL     | 2019-11-07 08:24:24 | NULL                |
| run_0000000_20190211_080340_test20190910 | job00010 | 2 Success  |  100,000 |        1 | 2019-11-07 08:24:24 | 2019-11-08 04:14:55 |
| run_0000000_20190211_080340_test20190910 | job00011 | 2 Success  |   31,118 |        1 | 2019-11-07 08:24:24 | 2019-11-07 09:48:01 |
+------------------------------------------+----------+------------+----------+----------+---------------------+---------------------+

report_jobs.py shows a list of all jobs currently registered on the LNF Tier2 with their status and location (if available). E.g.

[leonardi@padmeui prod]$ tools/report_jobs.py
https://atlasce1.lnf.infn.it:8443/CREAM065063790   [email protected]    REALLY-RUNNING
https://atlasce1.lnf.infn.it:8443/CREAM143754385   padme003@N/A                       REALLY-RUNNING
https://atlasce1.lnf.infn.it:8443/CREAM197909060   padme003@N/A                       REALLY-RUNNING
https://atlasce1.lnf.infn.it:8443/CREAM503695654   padme003@N/A                       REALLY-RUNNING
https://atlasce1.lnf.infn.it:8443/CREAM568084335   padme003@N/A                       REALLY-RUNNING
https://atlasce1.lnf.infn.it:8443/CREAM602645134   padme003@N/A                       REALLY-RUNNING
https://atlasce1.lnf.infn.it:8443/CREAM654251750   padme003@N/A                       REALLY-RUNNING
https://atlasce1.lnf.infn.it:8443/CREAM978165918   [email protected]    REALLY-RUNNING
https://atlasce4.lnf.infn.it:8443/CREAM814793478   [email protected]    RUNNING

If one or more jobs of a production fail, the delete_prod.py script can be used to rename all files related to that production to prod_name_deleted_XX so that the same production can be resubmitted. No files are really deleted in the operation. A FAKE mode is available to show the commands which would be executed by the script.

[leonardi@padmeui prod]$ tools/delete_prod.py -f -p run_0000000_20190724_151755_test20190910
FAKE mode enabled

===    1/1    === Deleting prod run_0000000_20190724_151755_test20190910 ===
> voms-proxy-info
- Production run_0000000_20190724_151755_test20190910 has id 1218
Renaming production run_0000000_20190724_151755_test20190910 to run_0000000_20190724_151755_test20190910_deleted_00
Moving log files from prod/test20190910/run_0000000_20190724_151755_test20190910 to prod/test20190910/run_0000000_20190724_151755_test20190910_deleted_00
Moving output files on srm://atlasse.lnf.infn.it:8446/srm/managerv2?SFN=/dpm/lnf.infn.it/home/vo.padme.org from /daq/2019/recodata/test20190910/run_0000000_20190724_151755_test20190910 to /daq/2019/recodata/test20190910/run_0000000_20190724_151755_test20190910_deleted_00
os.rename("prod/test20190910/run_0000000_20190724_151755_test20190910","prod/test20190910/run_0000000_20190724_151755_test20190910_deleted_00")
> gfal-rename srm://atlasse.lnf.infn.it:8446/srm/managerv2?SFN=/dpm/lnf.infn.it/home/vo.padme.org/daq/2019/recodata/test20190910/run_0000000_20190724_151755_test20190910/run_0000000_20190724_151755_test20190910_job00000_reco.root srm://atlasse.lnf.infn.it:8446/srm/managerv2?SFN=/dpm/lnf.infn.it/home/vo.padme.org/daq/2019/recodata/test20190910/run_0000000_20190724_151755_test20190910/run_0000000_20190724_151755_test20190910_deleted_00_job00000_reco.root
UPDATE file SET name = run_0000000_20190724_151755_test20190910_deleted_00_job00000_reco.root WHERE id = 7161
> gfal-rename srm://atlasse.lnf.infn.it:8446/srm/managerv2?SFN=/dpm/lnf.infn.it/home/vo.padme.org/daq/2019/recodata/test20190910/run_0000000_20190724_151755_test20190910 srm://atlasse.lnf.infn.it:8446/srm/managerv2?SFN=/dpm/lnf.infn.it/home/vo.padme.org/daq/2019/recodata/test20190910/run_0000000_20190724_151755_test20190910_deleted_00
UPDATE production SET storage_dir = /daq/2019/recodata/test20190910/run_0000000_20190724_151755_test20190910_deleted_00 WHERE id = 1218
UPDATE production SET name = run_0000000_20190724_151755_test20190910_deleted_00 WHERE id = 1218
Production run_0000000_20190724_151755_test20190910 was "deleted"
⚠️ **GitHub.com Fallback** ⚠️