Running HPX on SuperMIC - STEllAR-GROUP/hpx GitHub Wiki
SuperMIC has a total of 382 nodes, each with two 10-core 2.8GHz Intel Ivy Bridge-EP processors. 380 compute nodes each have 64 GB of memory and 500 GB of local HDD storage. 360 of the compute nodes have 2 Intel Xeon Phi 7120P [Knights Corner also known as KnC] coprocessors. 20 of the compute nodes have 1 Intel Xeon Phi 7120P coprocessor and 1 NVIDIA Tesla K20X. HPC@LSU
SuperMIC uses Torque 3.0.6 as its batch scheduler. The official documentation for SuperMIC can be viewed at http://www.hpc.lsu.edu/resources/hpc/system.php?system=SuperMIC.
You can create an account, request to join allocations, request new allocations, and see remaining balances on LSU HPC's account management.
SuperMIC provides its available software with modules .
The list of available modules can be viewed on the machine itself by doing a module avail
or on the cluster's software documentation.
-
module load <module_1>[ <module_2> [<module_3> ...]]
: Load module(s) <module_1>, etc to the current session- Equivalent to
module add <module_1>[ <module_2> [<module_3> ...]]
- e.g.
module load GCC/4.9.0 python/2.7.7/GCC-4.9.0
- Equivalent to
-
module unload <module_1>[ <module_2> [<module_3> ...]]
: Unloading module(s) <module_1>, etc from the current session- Equivalent to
module rm <module_1>[ <module_2> [<module_3> ...]]
- e.g.
module unload intel
- Equivalent to
-
module swap <module_1> <module_2>
: Unload module <module_1> and load <module_2>. Typically used when two modules conflict with each other- Equivalent to
module switch <module_1> <module_2>
- Equivalent to
-
module list
: List all modules loaded on the current session -
module purge
: Unload all modules loaded from the current session
For more documentation consult module's man page.
HPX 0.9.10 is available as a module on SuperMIC:
-
hpx/0.9.10/impi-4.1.3.048-intel64
is HPX 0.9.10 compiled with Intel 14.0.2 and Intel MPI 4.1.3 -
hpx/0.9.10/impi-4.1.3.048-intel64-mic
is a Xeon Phi build HPX 0.9.10 compiled with Intel 14.0.2 and Intel MPI 4.1.3 -
hpx/0.9.10/mvapich2-2.0-INTEL-14.0.2
is HPX 0.9.10 compiled with Intel 14.0.2 and MVAPICH 2 2.0
-
Intel, PGI and GCC compilers are available on SuperMIC, with Intel 14.0.2 being the default compiler.
-
Intel 14 is available as
INTEL/14.0.2
. It is loaded by default. -
Intel 15 is available as
INTEL/15.0.0
. -
GCC 4.9.0 is available as
gcc/4.9.0
.
Boost libraries
-
Boost 1.55 can be loaded with:
module load boost/1.55.0/INTEL-14.0.2
. - If Boost compiled with a different compiler, different configuration, or another version of Boost is needed users need to download and compile it themselves. @brycelelbach's script used to work for many users.
MPI Libraries
-
MVAPICH 2.0 is available as
INTEL-140-MVAPICH2/2.0
-
Intel MPI 4.1.3 is available as
impi/4.1.3.048/intel64
. -
Intel MPI 5.0.1.035 is available as
impi/5.0.1.035/intel64
. -
MPICH 3.0.3 is available as
mpich/3.0.3/INTEL-14.0.2
. -
MPICH 3.1.1 is available as
mpich/3.1.1/INTEL-14.0.2
orINTEL-140-MPICH/3.1.1
. -
OpenMPI 1.8.4 is available as
openmpi/1.8.4/INTEL-14.0.2
.
-
hwloc 1.10.0 is available as
hwloc/1.10.0/INTEL-14.0.2
-
HDF 5 1.8.12 is available as
hdf5/1.8.12/INTEL-140-MVAPICH2-2.0
. - libunwind, jemalloc, gperftools are not currently available as modules and have to be downloaded and compiled from their respective developers web pages.
-
CMake 2.8.12 is available as
cmake/2.8.12/INTEL-14.0.2
-
Python 2.7.7 is available as
python/2.7.7/GCC-4.9.0
. -
Anaconda Python distribution is available as
python/2.7.7-anaconda
. Anaconda Python comes with many scientific packages and tools pre-installed. For more information about Anaconda Python, visit http://docs.continuum.io/anaconda/index.html.
-
Valgrind 3.9.0 is available as
valgrind/3.9.0/GCC-4.9.0
. -
DDT 4.2.1 is available as
load ddt/4.2.1
. -
TotalView 8.12.1 is available as
totalview/8.12.1
.
For information about the compilation process take a look at the HPX Manual and Build recipes in HPX Documentation.
To get an interactive development shell on one of the nodes you can issue the following command:
$ qsub -A <allocation-id> -I -q <desired-queue> -l nodes=<number-of-nodes>:ppn=20 -l walltime=wall-time
Where allocation-id
is your allocation name, number-of-nodes
is the number of nodes you would like, -q desired-partition
is to specify the partition you would want to use, and wall time is the maximum session time in HH:MM:SS format. ppn=20
cannot be changed, because each node on SuperMIC has 20 cores and the cluster's policy is that you have to specify that number to allocate the whole node. After the shell has been acquired, you can run your HPX application. By default, it uses all available cores. Note that if you requested one node, you don't need to do mpirun
or pbsdsh
.
The above mentioned method of running HPX applications is fine for development purposes. The disadvantage that comes with interactive sessions is that it only returns once the application is finished. This might not be appropriate for longer running applications (for example benchmarks or larger scale simulations). In order to cope with that limitation we use batch jobs.
For a batch job you need to have a script that it can run once the requested resources are available. In order to request resources you need to add #PBS
comments in your script or provide the necessary parameters to qsub
directly. The commands you need to execute are the same you would need to start your application as if you were in an interactive shell.
Example batch script
The following example script runs
hello_world
on 2 nodes and schedules it in theworkq
queue.example.pbs:
#!/bin/bash #PBS -q workq #PBS -l nodes=2:ppn=20 #PBS -l walltime=00:05:00 #PBS -o example.out #PBS -e example.err #PBS -j oe #PBS -N ExampleJob uniq $PBS_NODEFILE >actual.nodes unset PBS_NODEFILE mpirun -f actual.nodes ./build/bin/hello_worldTo schedule the script, run the following:
qsub example.pbs
Running TCP HPX applications on SuperMIC can be done by using the pbsdsh
command.
Note
pbsdsh
does not pass some important environment variables (such asLD_LIBRARY_PATH
) to the application. Wrapping the execution in a setup script that prepares the environment is one solution to this problem. One such script looks like this:#!/bin/bash # File Name: env.sh export LD_LIBRARY_PATH="/usr/local/packages/mvapich2/2.0/INTEL->14.0.2/lib:/usr/local/compilers/Intel/cluster_studio_xe_2013.1.046/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/usr/local/compilers/Intel/cluster_studio_xe_2013.1.046/composer_xe_2013_sp1.2.144/mkl/lib/intel64" "$@"This script needs to have execution permission (Can be done with
chmod +x <setup-script>
).
To run a TCP application, the following command may be used:
$ pbsdsh -u <setup-script> <hpx-application> --hpx:nodes=$(cat $PBS_NODEFILE) --hpx:endnodes <hpx-application-arguments>
Where <setup-script>
is the absolute path to the setup script, <hpx-application>
is the application, <hpx-application-arguments>
contains the arguments that are passed to the application.
Example
$ pbsdsh -u $HOME/hpx/env.sh
When run on PBS, HPX determines which nodes it is being run on by opening the file $PBS_NODEFILE
points to examining its contents. However, SuperMIC makes this file available only to the PBS session that is running the job; meaning that not all HPX instances may be able to access it. In case you ran into this issue, we have two solutions for it:
Put the list of nodes in <node_file>
$ uniq $PBS_NODEFILE ><node_file>
$ unset PBS_NODEFILE
$ export HPX_NODEFILE=<node_file>
Use mpirun
to run the HPX application and pass actual.nodes
as the file containing the list of nodes.
$ mpirun -f $HPX_NODEFILE <hpx-application>
Example
$ uniq $PBS_NODEFILE >actual.nodes $ unset PBS_NODEFILE $ mpirun -f actual.nodes ./build/bin/hello_world
The following command can be used to run HPX applications with the MPI parcelport:
mpirun_rsh -ssh -np $PBS_NUM_NODES $(uniq $PBS_NODEFILE) <hpx-application>
Example
$ mpirun_rsh -ssh -np $PBS_NUM_NODES $(uniq $PBS_NODEFILE) ./build/bin/hello_world
The following queues are available on SuperMIC:
Queue | Walltime (hh:mm:ss) | Nodes | Max Allocation Allowed | Comment |
---|---|---|---|---|
workq | 72:00:00 | 128 | 128 | Regular Queue. Nodes have 2×Xeon Phi 7120P |
checkpt | 72:00:00 | 200 | 160 | Nodes have 2×Xeon Phi 7120P |
hybrid | 72:00:00 | 8 | - | Nodes have 1×Xeon Phi 7120P and 1×NVIDIA Tesla K20X. Not available through XSEDE. |
priority | 168:00:00 | 128 | - | Does not seem to be available for regular users |
Use qstat
to check the status of a job. This returns a status report including CPU, memory, and wall-time usage of all jobs that are either queued or running.
To view from a particular user: qstat -u <user-name>
e.g. To view your own jobs: qstat -u $USER
.
To see queue information: qstat -q
.
If -f
flag is used qstat
will show resources used, with statistics aggregated across the nodes the job is running on. -a
flag will show wall-time in hours and minutes.
Use qshow
to check resource utilization on nodes allocated to a job.
Example
$ qshow 11111 PBS job: 11111, nodes: 20 Hostname Days Load CPU U# (User:Process:VirtualMemory:Memory:Hours) smic001 9 0.11 609 26 parsa:1d_stencil_8:12.3G:11G parsa:pbs_demux:13M:1M parsa:mpirun:95M:7M parsa:hydra_pmi_proxy:94M:7M smic002 9 0.06 493 4 parsa:1d_stencil_8:10.9G:10G parsa:hydra_pmi_proxy:96M:7M smic003 9 1.45 490 4 parsa:1d_stencil_8:11.8G:11G parsa:hydra_pmi_proxy:96M:7M smic004 9 0.00 482 4 parsa:1d_stencil_8:12.0G:11G parsa:hydra_pmi_proxy:96M:7M smic005 9 0.00 489 4 parsa:1d_stencil_8:12.2G:11G parsa:hydra_pmi_proxy:96M:7M smic006 9 1.30 490 4 parsa:1d_stencil_8:11.4G:11G parsa:hydra_pmi_proxy:96M:7M smic007 9 1.27 490 4 parsa:1d_stencil_8:11.8G:11G parsa:hydra_pmi_proxy:96M:7M smic008 9 3.07 169 4 parsa:1d_stencil_8:7.3G:6.8G parsa:hydra_pmi_proxy:96M:7M smic009 9 1.44 509 4 parsa:1d_stencil_8:11.4G:10G parsa:hydra_pmi_proxy:96M:7M smic010 9 1.37 481 4 parsa:1d_stencil_8:11.4G:10G parsa:hydra_pmi_proxy:96M:7M smic011 9 1.47 485 4 parsa:1d_stencil_8:11.9G:11G parsa:hydra_pmi_proxy:96M:7M smic012 9 1.26 489 4 parsa:1d_stencil_8:12.5G:12G parsa:hydra_pmi_proxy:96M:7M smic013 9 1.30 479 4 parsa:1d_stencil_8:11.6G:11G parsa:hydra_pmi_proxy:96M:7M smic014 9 1.25 486 4 parsa:1d_stencil_8:10.8G:10G parsa:hydra_pmi_proxy:96M:7M smic015 9 1.15 493 4 parsa:1d_stencil_8:10.5G:10G parsa:hydra_pmi_proxy:96M:7M smic016 9 0.40 485 4 parsa:1d_stencil_8:11.7G:11G parsa:hydra_pmi_proxy:96M:7M smic017 9 1.19 473 4 parsa:1d_stencil_8:11.4G:11G parsa:hydra_pmi_proxy:96M:7M smic018 9 0.00 457 4 parsa:1d_stencil_8:11.4G:10G parsa:hydra_pmi_proxy:96M:7M smic019 9 0.00 480 4 parsa:1d_stencil_8:11.3G:10G parsa:hydra_pmi_proxy:96M:7M smic020 9 1.09 480 4 parsa:1d_stencil_8:12.2G:11G parsa:hydra_pmi_proxy:96M:7M PBS_job=11111 user=parsa allocation=hpc_supermic01 queue=workq total_load=19.18 cpu_hours=0.46 wall_hours=0.06 unused_nodes=0 total_nodes=20 ppn=20 avg_load=0.95 avg_cpu=475% avg_mem=10760mb avg_vmem=12171mb top_proc=parsa:1d_stencil_8:smic001:12.3G:11G:0.0hr:607% toppm=cchukw1:test38:smic001:229M:140M node_processes=4
If your job has not started yet you can edit some attributes of your job until it starts using the qalter
command. This is useful when SuperMIC is busy and your job will lose its' place in the queue if you cancel and enqueue another one. The script you passed as the argument however, cannot be changed with this command. The syntax is:
$ qalter [options ...] jobid
For instance, if you want to change the wall-time limit on job 11111 to 5 hours:
$ qalter -l walltime=5:00:00 11111
When you submit a job it will be queued and depending on the current status of the queues your job might spend some time in the queue until the resources are granted to it. showstart <job-id>
will give you a rough estimate which could be completely off. For instance if it shows exactly the midnight two or three days later in the future it's meaningless.
Example
$ showstart 11111
To cancel a job: qdel <job-id>
where <job-id>
is the id of the task.
For more information about job scheduling, take a look at the How to Use HPX Applications with PBS in HPX Documentation or visit http://www.hpc.lsu.edu/docs/pbs.php.