Running simulations on the TACC Stampede2 - chunshen1987/iEBE-MUSIC GitHub Wiki
This page explains the setup of the Stampede2 Cluster at the Taxes Advanced Computing Center (TACC).
First time set up (only need to do once)
Log in to the Stampede2 using ssh [UserName]@stampede2.tacc.utexas.edu
(The login node could be different for different users).
Once you log in to the Stampede2, you can request an interactive node by
idev
Then you can load the essential modules
module load tacc-singularity
- Download the singularity image of the iEBE-MUSIC framework,
mkdir -p $WORK/singularity_repos/
cd $WORK/singularity_repos/
singularity pull docker://chunshen1987/iebe-music:dev
- Download the iEBE-MUSIC framework code under your work directory,
cd $WORK
git clone https://github.com/chunshen1987/iEBE-MUSIC -b dev
After you finish these steps, you can exit the computing node by typing exit
and return to the login node for the job submissions described below.
Running Simulations
To generate and run a batch of simulations, you need to start from your work directory,
cd $WORK/iEBE-MUSIC/
./generate_singularity_jobs.py -w $SCRATCH/[runDirectory] -c stampede2 --node_type SKX -n 24 -n_hydro 1 -n_th 4 -par [parameterFile.py] -singularity $WORK/singularity_repos/iebe-music_dev.sif -b [bayesParamFile]
Here [runDirectory]
needs to be replaced with the information collision system. The file [parameterFile.py]
needs to be replaced by the actual parameter file. You can find example parameter files in the config/
folder. All the model parameters are listed in the config/parameters_dict_master.py
file. The training parameters for the Bayesian emulator can be specified using the -b
option with a parameter file [bayesParamFile]. Please note that you must provide the absolute path for the [bayesParamFile].
The option -n
specifies the number of jobs to run. On Stampede2, we recommend setting the number of hydro events to simulation per job -n_hydro
to 1 so that we can set the shortest walltime for each batch of jobs. With -n_hydro 1
, the option -n
effectively sets the number of total events one wants to simulate in the batch.
On Stampede2, the option --node_type
has the available options, SKX
, KNL
, and ICX
on Stampede2. When using KNL
nodes, we recommend setting the number of openMP threads -n_th 16
. For SKX
and ICX
nodes, -n_th 4
.
If you set the singularity image with a different name, you need to replace the iebe-music_dev.sif
with your customized name.
The full help message can be viewed by ./generate_singularity_jobs.py -h
,
usage: generate_singularity_jobs.py [-h] [-w] [-c] [--node_type] [-n]
[-n_hydro] [-n_th] [-par] [-singularity]
[-exe] [-b] [-seed]
⚛ Welcome to iEBE-MUSIC package
-h, --help show this help message and exit
-w , --working_folder_name
working folder path (default: playground)
-c , --cluster_name name of the cluster (default: local)
--node_type name of the queue (work on stampede2) (default: SKX)
-n , --n_jobs number of jobs (default: 1)
-n_hydro , --n_hydro_per_job
number of hydro events per job to run (default: 1)
-n_th , --n_threads number of threads used for each job (default: 1)
-par , --par_dict user-defined parameter dictionary file (default:
parameters_dict_user.py)
-singularity , --singularity
path of the singularity image (default: iebe-
music_latest.sif)
-exe , --executeScript
job running script (default:
Cluster_supports/WSUgrid/run_singularity.sh)
-b , --bayes_file parameters from bayesian analysis (default: )
-seed , --random_seed Random Seed (-1: according to system time) (default: -1)
After running this script, the working directory $SCRATCH/[runDirectory]
will be created. You can then submit the job by
cd $SCRATCH/[runDirectory]
sbatch submit_MPI_jobs.script
Make sure you are on the login node to submit jobs. If you are on a computing node (such as running idev
), you will get an error message from the system when you try to submit a computing job. In this case, you would need to exit the computing node and go back to the login node and submit the job from there.
While the jobs are running, you can type squeue -u $USER
to check the progress.
After the simulations finish, the final results will be automatically copied from the $SCRATCH/[runDirectory]
to $WORK/RESULTS/[runDirectory]
.