Phi Cluster - icl-utk-edu/cluster GitHub Wiki

The Phi cluster consists of 72 compute nodes connected with a QDR Infiniband network fabric. Each node has two 6-core Intel Xeon processors and 24GB of RAM. The full cluster provides 864 CPU cores and 1.7TB of RAM.

To use this cluster, you must have an ICL computing account. Log in with SSH to "phi.icl.utk.edu" using your ICL SSH key. You must allocate nodes and run processes using the Slurm job resource manager. Jobs are allocated on whole nodes, so salloc -N 2 will allocate two nodes each with 12 CPU cores. After running salloc, processes must run on the nodes using "srun". SSH directly to compute nodes is not supported. Run computational jobs only on compute nodes, not on the initial login node.

To see the status of the cluster use Slurm's sinfo:

sinfo

One node, called silicon, is down, and 65 nodes available for computing:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*         up 1-00:00:00      1  down* silicon
all*         up 1-00:00:00     65   idle antimony,argon,barium,beryllium,boron,bromine,cadmium,caesium,calcium,carbon,cerium,chlorine,chromium,dysprosium,erbium,europium,fluorine,gadolinium,gallium,germanium,hafnium,helium,holmium,indium,iodine,iron,krypton,lanthanum,lithium,lutetium,magnesium,manganese,molybdenum,neodymium,neon,nickel,niobium,nitrogen,oxygen,palladium,phosphorus,potassium,praseodymium,promethium,rhodium,rubidium,ruthenium,samarium,scandium,selenium,silver,sodium,strontium,sulfur,technetium,tellurium,terbium,thulium,tin,titanium,vanadium,ytterbium,yttrium,zinc,zirconium

Using MPICH 3

First load the MPIC 3 module:

module load mpich/3.2.1/gcc-7.2.0-2ggg

Compile your code with calls to MPI functions:

mpicc a.c

Run it through Slurm using 65 nodes with 12 ranks for each of the cores in the node for a total of 780 MPI ranks:

salloc -N 65 mpiexec ./a.out

You should get an output like this:

salloc: Granted job allocation 6580
salloc: Waiting for resource configuration
salloc: Nodes antimony,argon,barium,beryllium,boron,bromine,cadmium,caesium,calcium,carbon,cerium,chlorine,chromium,dysprosium,erbium,europium,fluorine,gadolinium,gallium,germanium,hafnium,helium,holmium,indium,iodine,iron,krypton,lanthanum,lithium,lutetium,magnesium,manganese,molybdenum,neodymium,neon,nickel,niobium,nitrogen,oxygen,palladium,phosphorus,potassium,praseodymium,promethium,rhodium,rubidium,ruthenium,samarium,scandium,selenium,silver,sodium,strontium,sulfur,technetium,tellurium,terbium,thulium,tin,titanium,vanadium,ytterbium,yttrium,zinc,zirconium are ready for job
salloc: Relinquishing job allocation 6580