Cluster Run - zward/Amua GitHub Wiki

Cluster Run

Amua models can be run on a cluster computer, improving performance by allowing computationally intensive analyses to be distributed and run in parallel.

Amua currently supports running probabilistic sensitivity analysis (PSA) on a cluster as of v0.3.0, and more cluster analysis options will be added in future versions. (Note: This feature assumes that you have access to a cluster computer and are familiar with distributed computing workflows.)

Cluster PSA

Selecting "Use iteration as RNG seed" will seed each simulation in the PSA with the iteration number (i.e. index within the job array on the cluster). This sets the seed for the first-order (Monte Carlo) uncertainty within each simulation. Specifying a PSA seed will set the RNG seed used to sample the parameters. This sets the seed for the second-order (parameter) uncertainty. If parameter sets are available as part of the model, selecting "Sample parameter sets" will sample a parameter set in each iteration.

Setting up the Amua cluster environment

Follow these steps to setup Amua on the cluster. This example for a cluster running SLURM (a common scheduler for Linux clusters).

In your working directory on the cluster make a new directory called "Amua" (optional, but can be useful to organize your work). In this directory:

Create a directory called 'out' (optional, but it is useful to organize the job output/error files)
Transfer a copy of the Amua jar (e.g. Amua_0.3.0.jar)
Transfer a copy your model (e.g. myModel.amua)
Define an Amua cluster run (Run -> Cluster Run) and transfer a copy of the created .xml file to the cluster (e.g. myClusterRun.xml)
Create a directory where you want the runs to written to file (e.g. runs_date)
Create a bash script (e.g. sim.sh) to submit the jobs to the cluster scheduler
Submit the runs using the bash script (sbatch sim.sh)

Sample bash script

This bash script creates a job array that runs 1000 iterations of a PSA in parallel. In a text editor create a file called sim.sh:

#!/bin/bash
#SBATCH -o /home/user/Amua/out/%A_%a.out  #where system output files should be written
#SBATCH -e /home/user/Amua/out/%A_%a.err  #where system error files should be written
#SBATCH -p short  #which cluster partition to use
#SBATCH -c 1  #how many cores per job to request.  This should match the number of threads specified in the model properties if multi-threading is used
#SBATCH --mem-per-cpu=8000M #how much memory to request per cpu
#SBATCH -t 720 #MM  #time (in minutes) to allow for the job
#SBATCH --array=1-1000  #job indices to create. 1-1000 will setup 1000 jobs with indices from 1-1000

/home/user/myJavaInstallation/bin/java -XX:+UseSerialGC -Xmx8000m -jar /home/user/Amua/Amua_0.3.0.jar myModel.amua myClusterRun.xml /home/user/Amua/runs_date/ "${SLURM_ARRAY_TASK_ID}"

Note: Replace "user" with your username and "myJavaInstallation" as appropriate

The arguments passed into Amua (all the text above after /home/user/Amua/Amua_0.3.0.jar) are:

myModel.amua: the Amua model you want to run on the cluster
myClusterRun.xml: the cluster run you want to perform
/home/user/Amua/runs_date/: the directory you want the model output to be written to
"${SLURM_ARRAY_TASK_ID}": the index of the job within the array - this will be passed into each run as the iteration and can be used to seed the simulation (see above)