Submit jobs to the grid - E1039-Collaboration/e1039-wiki GitHub Wiki

FermiGrid and Open Science Grid (OSG)

  • SpinQuest can make use of the computing resource at Fermilab (FermiGrid) and also the Open Science Grid (OSG).
  • A set of commands for job submission is provided by Fermilab at the interactive GPVM nodes for SpinQuest; spinquestgpvm01.fnal.gov and gpvm02.
  • The SpinQuest software (i.e. resource, share and core) is available at the interactive GPVM nodes and the Grid computers.

Environment Setup

  • Login to the submitter node via SSH, where your Kerberos account has to be active.
  • The job-submission commands are ready for use (i.e. in the PATH) by default.
  • You are recommended to use a wrapper script (/exp/seaquest/app/software/script/jobsub_submit_spinquest.sh) when submitting jobs. It properly sets several options specific to SpinQuest.
  • Many packages in e1039-analysis include a script (gridsub.sh) to submit multiple jobs at once (or multiple times). You are recommended to use one of them (as explained below) to understand how it works.

Basic Usage

You might go through the whole job-submission process by using e1039-analysis/SimChainDev/ as follows.

  1. Login to spinquestgpvm01.fnal.gov.
  2. Clone the e1039-analysis repository if you haven't done yet.
  3. cd /path/to/your/e1039-analysis/SimChainDev.
  4. Execute the program locally for test;
    ./gridsub.sh test 0 1 100
    
    • test is the name of this job, which is used as the name of a new directory to store job outputs.
    • 0 means that the job runs locally.
    • 1 is the number of jobs.
    • 100 is the number of events per job.
    • Job outputs will appear under scratch/test/.
  5. Get a Kerberos ticket by executing kinit.
  6. Submit two small jobs for test;
    ./gridsub.sh test-01 1 2 100
    
    • You might be asked to copy and paste an authentication URL to your web browser.
    • The 2nd argument is 1, which means that the jobs run on GRID.
    • Job outputs will appear under data/test-01/, where the top directory is not scratch but data when GRID is used.
  7. Monitor your submitted jobs by executing jobsub_q --group spinquest --user=$USER time-to-time. Wait until all finished.
  8. Submit more jobs;
    ./gridsub.sh test-02 1 100 1000
    
  9. Monitor your submitted jobs until they finished.

Write Permission

  • Normal E1039 user can
    • Write output files only under /pnfs/e1039/scratch/users and
    • Read all files under /pnfs/e1039 and /pnfs/e906.
  • Only granted users can write output files to any directory under /pnfs/e1039.
    • Mainly for data production.
    • The granted users (i.e. assigned the Production role) can be found at this FIfemon page. If the link does not work, you can manually
      1. Visit Fifemon,
      2. Search for User Info to open that dashboard, and
      3. Set Experiments to spinquest and Role to Production.

Notes

  • The back-end system of the job submission at Fermilab was changed to jobsub_lite in Feb. 2023. You can find details in DocDB 10460.

Tips

Standalone test of job submission

When you encounter an error during job submission, you might run this simple command to debug; jobsub_submit -G spinquest file:///usr/bin/printenv. The error might be resolved by reinitializing Condor and/or Kerberos tickets; htdestroytoken ; kdestroy ; kinit.

How to exclude bad OSG nodes

Use the "--append_condor_requirements" option of "jobsub_submit" as follows:

 cmd="$cmd --append_condor_requirements='(TARGET.GLIDEIN_Site isnt \"UCSD\")'"

It is indeed used by default in "e1039-analysis/SimChainDev/gridsub.sh". Valid site names are listed in this Wiki page. Note that the "--blacklist" option has a known defect according to the Fermilab Service Desk as of August 2019.

Old Information

How Grid Works

Page 9 of docDB#5509

Old setup script

  • Logon to a submitter node, e.g. spinquestgpvm01.fnal.gov or gpvm02. On other computers the following steps won't work.
  • Source the setup script as follows. It sets up a bunch of shell variables and commands for job submission. There are many similar scripts for this purpose, but this one will be our officially-maintained version.
    source /exp/seaquest/app/software/script/setup-jobsub-spinquest.sh
    
  • You can add the following line to your ".bashrc" in order to auto-source the setup script.
    test ${HOSTNAME:0:13} = 'spinquestgpvm' && source /exp/seaquest/app/software/script/setup-jobsub-spinquest.sh
    
  • You can source "setup-jobsub-seaquest.sh" in the identical directory to select the SeaQuest jobsub environment.