Contributed Scripts (Examples) - LCAS/Cluster

Many thanks to Petra!

Run with:

sbatch script_name.sh`

parallel_same_node.sh

#!/bin/bash
# file parallel_same_node.sh
#SBATCH --job-name=parallel_same
#SBATCH -n2

## Run two jobs in parallel on the SAME node (e.g. if each job requires a single GPU)
## Currently, creates an additional empty output file "slurm-jobid.out", not sure
## how to avoid.

srun -n1 echo "1" &> test_same_node_1.txt & #imagine this was a task using GPU 0
srun -n1 echo "2" &> test_same_node_2.txt & #imagine this was a task using GPU 1

parallel_job_array.sh

#!/bin/bash
# file parallel_job_array.sh
#SBATCH --job-name=parallel_array
#SBATCH --array=1-2
#SBATCH -n1
#SBATCH --output test_different_nodes_%a.txt

## Run two jobs in parallel on as a job array ("embarrassingly parallel") - each job will be assigned a different node

echo $SLURM_ARRAY_TASK_ID

... and many thanks to Raymond! Who wrote "Here's a template I've created for correctly running batch jobs (node independent) on the cluster, I hope it's useful to people running batch jobs":

#!/bin/bash
#SBATCH --job-name="SlurmBatchEx" # This name will be the JOB_NAME in squeue
#SBATCH --nodes=1 # Unless specifically using MPI etc, this should remain on 1
# When --gres is supported on the cluster the --gres=gpu:count parameter should be set here

# These parameters ensure the standard output/error streams are logged to files
#SBATCH --output _%u_%A_%a_%n_out.txt # Link standard output to file (_JobName_User_JobID_Node.txt)
#SBATCH --error _%u_%A_%a_%n_error.txt # Link standard error to file (_JobName_User_JobID_Node.txt)

# When the mail server is configured these parameters can be used to notify users when errors occur
#SBATCH --mail-type=ALL  # Send an email for start, end and abortion (BEGIN, END, FAIL, REQUEUE or ALL)
#SBATCH [email protected]  # Send Email to this Address

# This parameter defines how many copies of this script will run
#SBATCH --array=0-2 # Submit N jobs (use this parameter to run multiple jobs on multiple nodes)

# Source SBATCH environment libraries and load modules here
source venv/bin/activate

# Create parameter array so that the jobs can be ran with different parameters in the copied versions of this script
params=(
configs/lab_fruits_75/rgb_multiview.json
configs/lab_fruits_75/lab_multiview.json
configs/lab_fruits_75/fusion_multiview.json
)

# Save some of the meta data to the output file (in-case it crashes it can be linked back to a job)
echo "SLURM JOB ID" $SLURM_JOB_ID
echo "SLURM TASK ID" $SLURM_ARRAY_TASK_ID
echo "SLURM PARAM" ${params[$SLURM_ARRAY_TASK_ID]}

# Run your code with the selected parameter
python train.py --config ${params[$SLURM_ARRAY_TASK_ID]}

# To run this script 'sbatch batch_slurm_ex.sh'
## When ran with sbatch, slurm will run a copy of this script N times depending on the --array attribute.
## --array=0-2 will run three scripts with $SLURM_ARRAY_TASK_ID=0, $SLURM_ARRAY_TASK_ID=1, $SLURM_ARRAY_TASK_ID=2
## --array=0-11 will run twelve scripts with $SLURM_ARRAY_TASK_ID=0, ... , $SLURM_ARRAY_TASK_ID=11
## to easily run different experiments you can use bash arrays to run executables with alternate parameters

Contributed Scripts (Examples) - LCAS/Cluster_wiki GitHub Wiki

parallel_same_node.sh

parallel_job_array.sh

⚠️ GitHub.com Fallback ⚠️

Contributed Scripts (Examples) - LCAS/Cluster_wiki GitHub Wiki

parallel_same_node.sh

parallel_job_array.sh

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️