MPI Libraries - ciemat-tic/codec GitHub Wiki

We are currently using MVAPICH2.2 and OpenMPI.

Basic idea for this configuratiom:

  • user compiles with "mpicc" to use mvapich, or with "openmpi-mpicc" to use openMPI
  • we want to have MPI libraries integrated with slurm. This means that commands like "mpiexec" do NOT exist. Instead, jobs have to be executed like "srun -n 2 myTask".
  • as both libraries have the same interface, PMI2, Slurm just needs to use that API on the parallel applications, and the application itself will use the library it is linked against.

MVAPICH

Compilation

You need to have $LID_LIBRARY_PATH with slurm libs.

echo $LD_LIBRARY_PATH
/mvapich2/lib:/mvapich2/libexec:/mvapich2/include:/dmtcp/lib:/slurm/lib:/slurm/include:

if not, set them in /etc/profile. Including mvapich, slurm and DMTCP, mine looks like:

#SLURM
export PATH=/slurm/bin:/slurm/sbin/:$PATH
export LD_LIBRARY_PATH=/slurm/lib:/slurm/include:$LD_LIBRARY_PATH

#DMTCP
export PATH=/dmtcp/bin:$PATH
export LD_LIBRARY_PATH=/dmtcp/lib:$LD_LIBRARY_PATH

#MVAPICH
export PATH=/mvapich2/bin:$PATH
export LD_LIBRARY_PATH=/mvapich2/lib:/mvapich2/libexec:/mvapich2/include:$LD_LIBRARY_PATH
export MV2_ENABLE_AFFINITY=0
export MV2_DEBUG_SHOW_BACKTRACE=1
export MV2_ON_DEMAND_THRESHOLD=1

#RUNTIME
export LIBRARY_PATH=$LD_LIBRARY_PATH

A couple of required libs:

yum install libibverbs-devel gcc-gfortran byacc

Command is:

 $ ./configure --prefix=/home/localsoft/mvapich2 --disable-mcast --with-slurm=/home/localsoft/slurm --with-pmi=slurm --with-pm=none -enable-fortran=all --enable-cxx --enable-timing=none --enable-debuginfo --enable-mpit-pvars=all --enable-check-compiler-flags --enable-threads=multiple --enable-weak-symbols --enable-fast-install --enable-g=dbg --enable-error-messages=all --enable-error-checking=all

make 

make install

Errors

if this error is displayed when running anything

[root@slurmDev tests]# mpiexec ./helloWorldMPI
Warning! : Core id 32615 does not exist on this architecture!
CPU Affinity is undefined
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:2391
[cli_0]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(514):
MPID_Init(370).......:

Then this has to be set as enviornment variable. I put it directly in /etc/profile

export MV2_ENABLE_AFFINITY=0

OpenMPI

OpenMPI is configured AFTER mvapich.

Compilation

 ./configure --prefix=/home/localsoft/openmpi --with-pmi=/home/localsoft/slurm/

make

make install

Problem is that openMPI and mvapich commands are the same. In order to avoid conflicts, we create a symbolic link to the compiler. Then, we create another link to a configuration file.

#in openMPI root folder
mkdir bin_renamed
cd bin_renamed
ln -s ../bin/mpicc openmpi-mpicc

cd share/openmpi
ln -s mpicc-wrapper-data.txt openmpi-mpicc-wrapper-data.txt

What will happen is that Slurm will use a standard API to call to MPI both on OpenMPI and MVAPICH. The application will see the library it is linked against and use it.