Submit jobs to the grid - E1039-Collaboration/e1039-wiki GitHub Wiki
FermiGrid and Open Science Grid (OSG)
- SpinQuest can make use of the computing resource at Fermilab (FermiGrid) and also the Open Science Grid (OSG).
- A set of commands for job submission is provided by Fermilab at the interactive GPVM nodes for SpinQuest;
spinquestgpvm01.fnal.gov
andgpvm02
. - The SpinQuest software (i.e.
resource
,share
andcore
) is available at the interactive GPVM nodes and the Grid computers.
Environment Setup
- Login to the submitter node via SSH, where your Kerberos account has to be active.
- The job-submission commands are ready for use (i.e. in the
PATH
) by default. - You are recommended to use a wrapper script (
/exp/seaquest/app/software/script/jobsub_submit_spinquest.sh
) when submitting jobs. It properly sets several options specific to SpinQuest. - Many packages in
e1039-analysis
include a script (gridsub.sh
) to submit multiple jobs at once (or multiple times). You are recommended to use one of them (as explained below) to understand how it works.
Basic Usage
You might go through the whole job-submission process by using e1039-analysis/SimChainDev/ as follows.
- Login to
spinquestgpvm01.fnal.gov
. - Clone the
e1039-analysis
repository if you haven't done yet. cd /path/to/your/e1039-analysis/SimChainDev
.- Execute the program locally for test;
./gridsub.sh test 0 1 100
test
is the name of this job, which is used as the name of a new directory to store job outputs.0
means that the job runs locally.1
is the number of jobs.100
is the number of events per job.- Job outputs will appear under
scratch/test/
.
- Get a Kerberos ticket by executing
kinit
. - Submit two small jobs for test;
./gridsub.sh test-01 1 2 100
- You might be asked to copy and paste an authentication URL to your web browser.
- The 2nd argument is
1
, which means that the jobs run on GRID. - Job outputs will appear under
data/test-01/
, where the top directory is notscratch
butdata
when GRID is used.
- Monitor your submitted jobs by executing
jobsub_q --group spinquest --user=$USER
time-to-time. Wait until all finished. - Submit more jobs;
./gridsub.sh test-02 1 100 1000
- Monitor your submitted jobs until they finished.
- You might use FIFE monitor to check the job status.
Write Permission
- Normal E1039 user can
- Write output files only under
/pnfs/e1039/scratch/users
and - Read all files under
/pnfs/e1039
and/pnfs/e906
.
- Write output files only under
- Only granted users can write output files to any directory under
/pnfs/e1039
.- Mainly for data production.
- The granted users (i.e. assigned the
Production
role) can be found at this FIfemon page. If the link does not work, you can manually- Visit Fifemon,
- Search for
User Info
to open that dashboard, and - Set
Experiments
tospinquest
andRole
toProduction
.
Notes
- The back-end system of the job submission at Fermilab was changed to
jobsub_lite
in Feb. 2023. You can find details in DocDB 10460.
Tips
Standalone test of job submission
When you encounter an error during job submission, you might run this simple command to debug; jobsub_submit -G spinquest file:///usr/bin/printenv
.
The error might be resolved by reinitializing Condor and/or Kerberos tickets; htdestroytoken ; kdestroy ; kinit
.
How to exclude bad OSG nodes
Use the "--append_condor_requirements" option of "jobsub_submit" as follows:
cmd="$cmd --append_condor_requirements='(TARGET.GLIDEIN_Site isnt \"UCSD\")'"
It is indeed used by default in "e1039-analysis/SimChainDev/gridsub.sh". Valid site names are listed in this Wiki page. Note that the "--blacklist" option has a known defect according to the Fermilab Service Desk as of August 2019.
Old Information
How Grid Works
Page 9 of docDB#5509
Old setup script
- Logon to a submitter node, e.g.
spinquestgpvm01.fnal.gov
orgpvm02
. On other computers the following steps won't work. - Source the setup script as follows. It sets up a bunch of shell variables and commands for job submission. There are many similar scripts for this purpose, but this one will be our officially-maintained version.
source /exp/seaquest/app/software/script/setup-jobsub-spinquest.sh
- You can add the following line to your ".bashrc" in order to auto-source the setup script.
test ${HOSTNAME:0:13} = 'spinquestgpvm' && source /exp/seaquest/app/software/script/setup-jobsub-spinquest.sh
- You can source "setup-jobsub-seaquest.sh" in the identical directory to select the SeaQuest jobsub environment.