Quick Start - naoyam/lbann GitHub Wiki

This is an experimental new version of LBANN. The main branch is distconv.

Build Instructions

To build the distconv LBANN, follow the below steps:

1. Obtain the distconv package

The distconv package is currently managed as a separate repository hosted at the LC bitbucket site. Contact @naoyam if you need to access the repository.

2. Build and install distconv

At the top level directory of distconv, run make install PREFIX=.../somewhere/.... It depends on C++ 11, CUDA, cuDNN, and MPI. Make sure that all the compilers and libraries use the same versions as those used for building LBANN in the next step.

3. Build LBANN with the installed distconv package

When using build_lbann_lc.sh, adjust variable DISTCONV_DIR to point to the directory path where distconv was installed at the previous step. When running the build script, add option --with-distconv to enable building with distconv enabled.

System Specific Notes

Ray

On Ray, use of MVAPICH2-GDR is currently necessary for should be used instead of Spectrum-MPI due to large performance difference. MVAPICH2-GDR works without the GPU Direct RDMA capability, and includes more advanced optimization for GPU data transfer. MVAPICH2-GDR is available under /usr/workspace/wsb/brain/mvapich2-gdr. Note that the following environment variables must be set:

MV2_USE_CUDA=1
MV2_USE_GPUDIRECT=0
MV2_USE_GPUDIRECT_GDRCOPY=0
LD_PRELOAD=/usr/workspace/wsb/brain/mvapich2-gdr/2.3a/gcc-4.9.3_cuda-8.0_ppc64le/libmpi.so