data file on grid - E1039-Collaboration/e1039-wiki GitHub Wiki
How to Handle Data Files on GRID
A data file here means a binary file that is too large to be included in a set of user-level analysis codes, namely a real or simulated DST file.
In most cases it is properly handled by the scripts and macros under e1039-analysis
by default.
This page explains special cases that need an extra handling by each user.
Input Data File
Below is an appropriate procedure for inputting the real DST file to analysis as of 2020-September-07. The collaboration had better establish a better procedure when this access case is required more frequently.
- Suppose we read a real DST file,
/data2/e1039/dst/run_001795_spin.root
, for example. - Copy the file to
/pnfs/e1039/scratch/$USER/dst
.- Because
/data2
is not accessible during the job submission. - Another directory under
/pnfs/e1039
is OK, although/pnfs/e1039/scratch
should be fastest.
- Because
- Tell the file location to
jobsub_submit
via-f
ingridsub.sh
.- By adding the following line;
cmd="$cmd -f /pnfs/e1039/scratch/$USER/dst/run_001795_spin.root"
.
- By adding the following line;
- Use
$CONDOR_DIR_INPUT/run_001795_spin.root
as an input file name ingridrun.sh
.- Neither
/pnfs/...
nor/data2/...
is accessible ingridrun.sh
.
- Neither
Output Data File
No special case is anticipated.
All output files should be created under $CONDOR_DIR_INPUT
.
Note that a job will get stuck if a -d OUTPUT
option is given to jobsub_submit
but no output file is created in $CONDOR_DIR_INPUT
.