Software: MPI - nthu-ioa/cluster GitHub Wiki
:warning: This page is still being written. Please get in touch with the admins if you have any questions or suggestions.
About MPI
MPI (Message Passing Interface) is a standard for communication between independent processes, usually over a network. MPI can be used to write programs that solve problems by working on many CPU cores in parallel, even if those CPU cores are spread across many different machines. To use MPI, programs have to be linked with a software library that provides the standard MPI functions. The MPI library we use on our cluster is called OpenMPI.
https://mpitutorial.com/tutorials/
Using MPI effectively requires a basic understanding of the hardware in our machines, and how programs are executed on that hardware. See Tutorial: Parallel Computing. That page also explains the difference between MPI and multithreading/multiprocessing, which are types of shared memory parallelism.
In a nutshell, the point of MPI is to allow programs to communicate between physically separate machines, over a network. MPI programs usually start one process per machine. Those processes are identical and do exactly the same job. Each process can only see its own 'local' memory.
However, because the processes can send messages to each other over the network using MPI, they can still work together to solve a bigger problem. For example, they can each read in part of a big file, use that data to compute something, and then share the result with each other.
Launching MPI Jobs
Like all jobs on the cluster, MPI jobs should be submitted to the Slurm batch queue using sbatch
. The Slurm environment and the commands to start the job should be provided in a shell script. See our page on Slurm for more details.
For MPI jobs, the batch script needs to contain three things:
- A set of sbatch directives to allocate cluster resources for the job in a way that MPI can use efficiently;
- Environment variables and modules that configure the MPI environment;
- A job command line that starts your computation using an MPI-aware job controller (
mpirun
orsrun
).
The following sections walk through these three topics.
SBATCH directives for MPI jobs
MPI environment variables
The job command line
Inside your job script, there are two commands that can be used to launch MPI jobs on the cluster:
- Using
mpirun
- Using
srun
For the time being the correct choice depends on how your code was compiled. The bottom line is:
- Use
mpirun
if you compiled your MPI code yourself using the MPI libraries on the cluster; - Use
srun
if you are using a code that has already been compiled with different MPI libraries, including python with thempi4py
module or any code you have installed via theconda
package manager.
mpirun
options for MPI
srun
options for MPI
FAQs
srun
. What is the difference between srun
and mpirun
?
Q. For serial jobs we always use There is not a big difference.
mpirun
is a job-launching code that creates an environment for processes running on a cluster of machines to talk to each other using the MPI protocol. It is independent of batch job systems Slurm.
srun
is an all-purpose job controller for jobs launched under Slurm. It works for serial and parallel jobs. Setting up the 'plumbing' to connect tasks on different machines is very similar to what mpirun
does, but srun
also does some extra work to implement the resource management and job control in Slurm.
If everything was set up on our cluster correctly, srun
would completely replace mpirun
. However, everything is not set up perfectly, yet. With our current setup, srun
will fail to launch codes that have been compiled with the libraries provided by module load openmpi
. Since most codes that are run on the cluster will be linked with those libraries, they have to be started with mpirun
(the mpirun
command itself is provided by our openmpi
module). There is little or no difference in terms of functionality and performance.
However, we have found that at least some MPI-capable codes built by the conda
package manager (specifically those that use Python's mpi4py
library) will fail if they're launched by our cluster mpirun
, but run fine when launched by srun
. We're still investigating exactly why this is the case.
In future, the system will be configured in such a way that srun
will be the only option.
mpi4py
python package? Should I use mpirun
or srun
?
Q. What if I wrote my own python code using the If you installed mpi4py
using conda
, probably srun
, because the codes will still be linked to MPI C
libraries from conda. If you installed mpi4py
using pip (or built it yourself), it may be linked to our cluster MPI libraries, in which case you'll probably need mpirun
.