Execution - gnomeCreative/HYBIRD GitHub Wiki

Running HYBIRD

Once HYBIRD is compiled, it can be run via terminal. A typical command line looks like this:

command_line

A sample command line

On a local computer, the executable created with the CPU version can be run by preparing a series of files and folders (see the tutorial section, Sec. Tutorials), and by executing the following command on the terminal:

OMP_NUM_THREADS={n_p} ./hybird.exe -c path/to/config/file.cfg -d path/to/storage/folder -n simulation_name {other options}

The number that substitutes n_p sets the number of processors that run the simulations. For the GPU version, this can be dropped. The other options are described below:

  • -c path/to/config/file.cfg: This is a compulsory option. Specifies the location of the configuration file, which can be of any extension (.cfg is used throughout this manual).

  • -d path/to/storage/folder: Specifies the location of the directory where the results should be created. The directory needs to exist before the code starts running. Note that every simulation run will then create a subfolder within this directory.

  • -n simulation_name: Specifies the name of the simulation. Eventually, this is the name of the subfolder created by the simulation. If the name is specified as -n time, then the subfolder will be named as a combination of the date and time of when the simulation starts.

  • Other options: Any parameter set in the configuration file can be overwritten using the following pattern: -{config_parameter} {value} as an additional option in the command line. Particularly useful when a large set of simulations is run, because it cancels the need to have multiple configuration files.

Running HYBRID on an HPC system (e.g. Stanage @ Sheffield University)

In the cluster, it is not a good idea to run the code by issuing a command. Rather, it is necessary to prepare an additional file, a "job file", which can be created with any text editor. The extension is .sh, e.g. myjobfile1.sh. The job file contains the information necessary to run the code, in addition to the command line itself. It contains several options and is needed for the cluster to manage its use by multiple users efficiently. An example job file is provided in the manual folder.

jobfile

A sample job file

CPU version

The typical jobfile template for the CPU version is the following:

#!/bin/bash
#SBATCH --job-name=SImulation name         # name of the simulation. This is the job name that will appear in the cluster queue and will be useful to identify the simulation. It should be meaningful but also short, a maximum of 6 characters is recommended. It is a good idea to use the same name also for the results folder (the option "-n").
#SBATCH --nodes=1                          # number of computers (nodes) to be used. HYBIRD is optimized for a single node, so "1" is the correct choice.
#SBATCH --ntasks-per-node=1                # number of tasks per node
#SBATCH --ntasks=1                         # number of taks
#SBATCH --cpus-per-task=4                  # number of cores (must match export OMP_NUM_THREADS=$(threadsnumber))
#SBATCH --mem=20G                          # RAM to be allocated per node. The amount needs to be estimated with a good level of precision. Simulations that use only a small part of the reserved memory will be killed, and simulations exceeding it risk slowing down unnecessarily. It is possible to estimate the memory required by running the simulation on a local machine and checking the memory consumption used by hybird.exe in the task manager.
#SBATCH --output=./results/output.txt      # text file for the output. This is the output that, on a local simulation, would be displayed on the terminal.
#SBATCH --error=./results/error.txt        # text file for errors
#SBATCH --time=96:00:00                    # timeout of the simulation
#SBATCH --mail-user=$(email)               # email with HPC update (queue time, begin, and end simulation)
#SBATCH --mail-type=ALL                    # number of email alert regarding the simulation

# load modules
load CMake
module load GCC/12.3.0

# HYBIRD command line
export OMP_NUM_THREADS=4
./hybird -c $(configurationFileDirectory/configurationFileName.cfg) -d ./$(resultsDirectory) -n $(resultFolderName)`
Note that the job file also needs to have a line that sets up the number of threads. This is export OMP_NUM_THREADS=N where N is the number of threads (4 in the example). It is not recommended here to use the option -n time to name the simulations with a timestamp. Always use a set name. Important: every option in the configuration file that includes a file path should be updated to make sense in the cluster folder. This includes the particle file, even if it is empty.

Running HYBRID GPU on Stanage @ Sheffield University

A jobfile template for the GPU is the following:

#!/bin/bash
#SBATCH --job-name=SImulation name       # name of the simulation. This is the job name that will appear in the cluster queue and will be useful to identify the simulation. It should be meaningful but also short, a maximum of 6 characters is recommended. It is a good idea to use the same name also for the results folder (the option "-n").
#SBATCH --partition=gpu-h100             # H100 GPU Partition (same performance as A100 for HYBIRD, probably less busy)
#SBATCH --qos=gpu
#SBATCH --gres=gpu:1                     # 1 GPU, quarter of a node
#SBATCH --mem=80G                        # 80GB RAM, quarter of a node
#SBATCH --cpus-per-task=12               # 12 cores, quarter of a node
#SBATCH --time=0-01:00                   # time (DD-HH:MM) CONFIGURE IT AS EXPECTED
#SBATCH --output=./results/output.txt    # text file for the output. This is the output that, on a local simulation, would be displayed on the terminal.
#SBATCH --error=./results/error.txt      # text file for errors
#SBATCH --mail-user=$(email)             # email with HPC update (queue time, begin, and end simulation)
#SBATCH --mail-type=ALL                  # number of email alert regarding the simulation

# Load same modules as used to compile HYBIRD
module load CMake
module load GCC/12.3.0
module load CUDA/12.4.0

# HYBIRD command line
./hybird -c $(configurationFileDirectory/configurationFileName.cfg) -d ./$(resultsDirectory) -n $(resultFolderName)`

Submitting the jobfile & executing

Once the job file is ready, the simulation folder should have the following elements:

  • The executable (e.g. "hybird.exe")

  • A results folder

  • A subfolder inside the results folder, named identically to the simulation name declared in the job file (d2 in the example). Alternatively the option "-n time" can be used to name the simulation

  • The configuration file

  • Any other inoput file, for example a particle file(s) or a Yaml geometry file

Once this is ready, it is time to submit the job. Log into the cluster using a terminal (e.g. Cygwin, or WSL), e.g. issuing:

Substitute cluster_username with your personal username in the cluster, answer "yes" to the security question, and type your password when requested.

Now you are using the terminal, and you are connected to the cluster. Navigate to the simulation folder, and type:

sbatch jobfile_name.sh

where jobfile_name should be substituted with the name of your job file. The job will then be queued, and will run as soon as the requested resources are available. You will receive an email when this happens. To check the status of the simulation, run:

squeue -u cluster_username

Substitute cluster_username with your personal username in the cluster. You will see a list of all submitted jobs, with their status. R means that a job is running, P means paused, S means suspended, etc. Note that a numerical ID is associated with every job. If you want to kill that job, run:

scancel jobID

and to cancel all jobs, run:

scancel -u cluster_username

When the job is finished, you will receive an email. Should the simulation fail, you can check what was wrong by looking inside the error file in the result folder. If you cannot find this file, probably the folder was not prepared properly before submitting the job file. A more comprehensive guide to this system is available at https://slurm.schedmd.com/quickstart.html.

⚠️ **GitHub.com Fallback** ⚠️