Contributed Scripts (Examples) - LCAS/Cluster_wiki GitHub Wiki
Many thanks to Petra!
Run with:
sbatch script_name.sh`
#!/bin/bash # file parallel_same_node.sh #SBATCH --job-name=parallel_same #SBATCH -n2 ## Run two jobs in parallel on the SAME node (e.g. if each job requires a single GPU) ## Currently, creates an additional empty output file "slurm-jobid.out", not sure ## how to avoid. srun -n1 echo "1" &> test_same_node_1.txt & #imagine this was a task using GPU 0 srun -n1 echo "2" &> test_same_node_2.txt & #imagine this was a task using GPU 1
#!/bin/bash # file parallel_job_array.sh #SBATCH --job-name=parallel_array #SBATCH --array=1-2 #SBATCH -n1 #SBATCH --output test_different_nodes_%a.txt ## Run two jobs in parallel on as a job array ("embarrassingly parallel") - each job will be assigned a different node echo $SLURM_ARRAY_TASK_ID
... and many thanks to Raymond! Who wrote "Here's a template I've created for correctly running batch jobs (node independent) on the cluster, I hope it's useful to people running batch jobs":
#!/bin/bash #SBATCH --job-name="SlurmBatchEx" # This name will be the JOB_NAME in squeue #SBATCH --nodes=1 # Unless specifically using MPI etc, this should remain on 1 # When --gres is supported on the cluster the --gres=gpu:count parameter should be set here # These parameters ensure the standard output/error streams are logged to files #SBATCH --output _%u_%A_%a_%n_out.txt # Link standard output to file (_JobName_User_JobID_Node.txt) #SBATCH --error _%u_%A_%a_%n_error.txt # Link standard error to file (_JobName_User_JobID_Node.txt) # When the mail server is configured these parameters can be used to notify users when errors occur #SBATCH --mail-type=ALL # Send an email for start, end and abortion (BEGIN, END, FAIL, REQUEUE or ALL) #SBATCH [email protected] # Send Email to this Address # This parameter defines how many copies of this script will run #SBATCH --array=0-2 # Submit N jobs (use this parameter to run multiple jobs on multiple nodes) # Source SBATCH environment libraries and load modules here source venv/bin/activate # Create parameter array so that the jobs can be ran with different parameters in the copied versions of this script params=( configs/lab_fruits_75/rgb_multiview.json configs/lab_fruits_75/lab_multiview.json configs/lab_fruits_75/fusion_multiview.json ) # Save some of the meta data to the output file (in-case it crashes it can be linked back to a job) echo "SLURM JOB ID" $SLURM_JOB_ID echo "SLURM TASK ID" $SLURM_ARRAY_TASK_ID echo "SLURM PARAM" ${params[$SLURM_ARRAY_TASK_ID]} # Run your code with the selected parameter python train.py --config ${params[$SLURM_ARRAY_TASK_ID]} # To run this script 'sbatch batch_slurm_ex.sh' ## When ran with sbatch, slurm will run a copy of this script N times depending on the --array attribute. ## --array=0-2 will run three scripts with $SLURM_ARRAY_TASK_ID=0, $SLURM_ARRAY_TASK_ID=1, $SLURM_ARRAY_TASK_ID=2 ## --array=0-11 will run twelve scripts with $SLURM_ARRAY_TASK_ID=0, ... , $SLURM_ARRAY_TASK_ID=11 ## to easily run different experiments you can use bash arrays to run executables with alternate parameters