data file on grid - E1039-Collaboration/e1039-wiki GitHub Wiki

How to Handle Data Files on GRID

A data file here means a binary file that is too large to be included in a set of user-level analysis codes, namely a real or simulated DST file. In most cases it is properly handled by the scripts and macros under e1039-analysis by default. This page explains special cases that need an extra handling by each user.

Input Data File

Below is an appropriate procedure for inputting the real DST file to analysis as of 2020-September-07. The collaboration had better establish a better procedure when this access case is required more frequently.

  1. Suppose we read a real DST file, /data2/e1039/dst/run_001795_spin.root, for example.
  2. Copy the file to /pnfs/e1039/scratch/$USER/dst.
    • Because /data2 is not accessible during the job submission.
    • Another directory under /pnfs/e1039 is OK, although /pnfs/e1039/scratch should be fastest.
  3. Tell the file location to jobsub_submit via -f in gridsub.sh.
    • By adding the following line; cmd="$cmd -f /pnfs/e1039/scratch/$USER/dst/run_001795_spin.root".
  4. Use $CONDOR_DIR_INPUT/run_001795_spin.root as an input file name in gridrun.sh.
    • Neither /pnfs/... nor /data2/... is accessible in gridrun.sh.

Output Data File

No special case is anticipated. All output files should be created under $CONDOR_DIR_INPUT. Note that a job will get stuck if a -d OUTPUT option is given to jobsub_submit but no output file is created in $CONDOR_DIR_INPUT.