7. Running Jobs - bregord/emcee-on-calcul-quebec GitHub Wiki
Relevant Calcul Quebec wiki page
##Job Submission and Scheduling Overview
In order to ensure a fair and efficient utilization of resources, all Calcul Qubec systems resource management system Torque with either the Moab or Maui job schedulers.
The basic workflow is as follows:
Jobs, in the form of submission files, will be submitted to Moab or Maui via the "msub" or "qsub" commands. Those schedulers will then place the job in a queue of other jobs, according to an allocation algorithm that priortizes users who have used less of their allocated time than others first. When a user's submission reaches the top of the queue, and resources are available, the submission file will be given to torque which will then allocate resources for it and run it.
Other commands for interacting with Torque, Maui and Moab (to see available resources, or your job's status in the queue) can be found below.
Maui commands (For Guillimin and Colosse only)
##Submitting a Job
###Running a Test
Most systems have a debug queue that will run a job almost immediately, under a few constraints. These queues are useful to test your job to see if it runs correctly. However, in the case of scripts utilizing emcee with a large number of paramaters, dimensions, and/or walkers it can often be untenable. Therefore, it is recommended to create a test version of your python using a smaller number of parameters, dimensions, and/or walkers.
###Running
To run a job, simply use the qsub command followed by the name of your submission script like so:
qsub script.sh
To interact with your job while it is running, there are various arguments that can be used with the qsub command which can be found in the links provided above for interacting with Moab, Maui and Torque.
The most often used ones are:
To see a list of jobs
qstat
To see jobs associated with you:
qstat -a -u $USER
To remove a job:
qdel JOBID
##Retrieving Results To retrieve results, simply use the transfer-from flag with the provided script, or use scp. Results will be of the form that you output in your submission script, as well as those generated by torque, which is explained in the next section.
###Torque Output Unless the -o or - e arguments were used in your submission script, Torque will automatically produce output of the format YOUR_JOB_NAME.oNNNNN or YOUR_JOB_BANE.eNNNNNN where NNNNNN is the job id assigned to your particular job, and .o and .e refer to the output and error output respectively.
##More Information
All of Calcul Quebec's systems vary slighly in their particular configurations. More information about specific queues that can be used, how memory allocation is handled, and other information can be found on this wiki under the "Guide To Briaree/Colosse/Guillimin" pages on this wiki, or on the calcul quebec wiki here