Submit jobs to the grid - E1039-Collaboration/e1039-wiki GitHub Wiki

How Grid Works

Page 9 of docDB#5509

Environment Setup

  • Logon to a submitter node, e.g. spinquestgpvm01.fnal.gov or gpvm02. On other computers the following steps won't work.
  • Source the setup script as follows. It sets up a bunch of shell variables and commands for job submission. There are many similar scripts for this purpose, but this one will be our officially-maintained version.
    source /exp/seaquest/app/software/script/setup-jobsub-spinquest.sh
    
  • You can add the following line to your ".bashrc" in order to auto-source the setup script.
    test ${HOSTNAME:0:13} = 'spinquestgpvm' && source /exp/seaquest/app/software/script/setup-jobsub-spinquest.sh
    
  • You can source "setup-jobsub-seaquest.sh" in the identical directory to select the SeaQuest jobsub environment.

Basic Procedure

You could learn the Grid usage by trying the following procedure.

  1. Set up the environment as described above.
  2. Get a Kerberos ticket
    kinit
    <type your pwd>
    
  3. Modify a job-submission script based on your needs.
  4. Test run the job locally
    ./gridsub.sh test-10k 0 2 10000
    
  5. Test run small batch tests
    ./gridsub.sh test-10k 1 2 10000
    
    • You will be asked to copy and paste an authentication URL to your web browser.
    • You can run jobsub_q --user=$USER (or its alias jobsub_q_mine) to check the status of your jobs.
  6. Run your jobs
    • You adjust the command-line arguments of gridsub.sh as you need.
    • You can use FIFE monitor to check the job status.

Write Permission

  • Normal E1039 user can
    • Write output files only under /pnfs/e1039/scratch/users and
    • Read all files under /pnfs/e1039 and /pnfs/e906.
  • Only granted users can write output files to any directory under /pnfs/e1039.
    • Mainly for data production.
    • The granted users (i.e. assigned the Production role) can be found at this FIfemon page. If the link does not work, you can manually
      1. Visit Fifemon,
      2. Search for User Info to open that dashboard, and
      3. Set Experiments to spinquest and Role to Production.

Notes

  • The back-end system of the job submission at Fermilab was changed to jobsub_lite in Feb. 2023. You can find details in DocDB 10460.

Tips

How to exclude bad OSG nodes

Use the "--append_condor_requirements" option of "jobsub_submit" as follows:

 cmd="$cmd --append_condor_requirements='(TARGET.GLIDEIN_Site isnt \"UCSD\")'"

It is indeed used by default in "e1039-analysis/SimChainDev/gridsub.sh". Valid site names are listed in this Wiki page. Note that the "--blacklist" option has a known defect according to the Fermilab Service Desk as of August 2019.