Quick Start - naoyam/lbann GitHub Wiki
This is an experimental new version of LBANN. The main branch is distconv.
Build Instructions
To build the distconv LBANN, follow the below steps:
1. Obtain the distconv package
The distconv package is currently managed as a separate repository hosted at the LC bitbucket site. Contact @naoyam if you need to access the repository.
2. Build and install distconv
At the top level directory of distconv, run make install PREFIX=.../somewhere/...
. It depends on C++ 11, CUDA, cuDNN, and MPI. Make sure that all the compilers and libraries use the same versions as those used for building LBANN in the next step.
3. Build LBANN with the installed distconv package
When using build_lbann_lc.sh
, adjust variable DISTCONV_DIR
to point to the directory path where distconv was installed at the previous step. When running the build script, add option --with-distconv
to enable building with distconv enabled.
System Specific Notes
Ray
On Ray, use of MVAPICH2-GDR is currently necessary for should be used instead of Spectrum-MPI due to large performance difference. MVAPICH2-GDR works without the GPU Direct RDMA capability, and includes more advanced optimization for GPU data transfer. MVAPICH2-GDR is available under /usr/workspace/wsb/brain/mvapich2-gdr. Note that the following environment variables must be set:
MV2_USE_CUDA=1
MV2_USE_GPUDIRECT=0
MV2_USE_GPUDIRECT_GDRCOPY=0
LD_PRELOAD=/usr/workspace/wsb/brain/mvapich2-gdr/2.3a/gcc-4.9.3_cuda-8.0_ppc64le/libmpi.so