troubleshooting - noma/dm-heom GitHub Wiki

Troubleshooting

This is a collection of common, reappearing issues, some of which are preventable by carefully reading the documentation.


Problem: Anything not listed below, might be caused by not using bash. Any scripts provided with DM-HEOM are developed and tested with bash, as are the commands shown in the documentation.

Solution You can of course adjust everything to your shell of choice, feedback and contributions are always welcome. Check your shell: echo $0. Quickfix: start bash manually. Long term solution: change your login shell.


Problem: Some exception with regex.

Solution:

  • a) The used C++ Compiler is too old and thus has incomplete regex support (known for g++ prior to 4.9). Check your ~/.bashrc, make sure it is sourced, check g++ -v, call cmakelike this CXX=`which g++` CC=`which gcc` cmake ../dm-heom, check the compiler version in the output, check the used tools via make VERBOSE=1.
  • b) The config file is malformed in an unexpected way. Try a development build: i.e. using cmake with -DCMAKE_BUILD_TYPE=Debug to narrow it further down if the output is insufficient. Start in a fresh build directory (does not need be named build, you can have as many as you need).

Problem: A screen full of Boost related linker errors.

Solution: Make sure the Boost version you are using at link time has been compiled with the same compiler as DM-HEOM. If you are using a custom built compiler, e.g. GCC 5.3.0, but the system packages for Boost have been compiled with the system compiler, let's say a GCC 4.8.5, the ABI is not compatible.

OpenCL

OpenCL can be tough to set up, especially in an HPC environment. The main concept behind OpenCL is portability, achieved by providing device-specific implementations that provide an optimising compiler at runtime for a specific device.

From the application developer's view, OpenCL is a library used within a host application. The application uses OpenCL calls to compile and run pieces of kernel code using some OpenCL implementation present on the target system. This makes sure the performance critical code runs on and is optimised for the available hardware (compute device). The latter can be anything for which an OpenCL implementation is available, CPU, GPU, Xeon Phi accelerator, FPGA, etc.

However, this approach and its different implementations have their caveats.

OpenCL Installable Client Driver (ICD)

The ICD mechanism makes sure multiple OpenCL implementations can reside beside each other. This works via the /etc/OpenCL/vendors directory, where every installed SDK puts small text file with an .icd ending, which contains the path of a library. This library is the actual OpenCL implementation. Each library referenced this way in /etc/OpenCL/vendors represents a so called OpenCL Platform (i.e. an installed SDK or implementation).

The generic libOpenCL.so the application is linked against is just a loader, that implements the so called OpenCL Platform layer. It allows merely to get a list of available OpenCL platforms (installed SDKs/implementations), and the functionality to dynamically and transparently load the implementation library for the used platform.

If multiple SDKs are installed, the libOpenCL.so in the system library directory can happen to be overwritten. This usually puts last implementation installed in use for the platform layer (ICD mechanism). Sadly, every vendor has its own implementation of the ICD mechanism, and platform and device orders can differ. That means SDK installation order matters.

Even more annoying, querying devices using a platform, uses the ICD loader libOpenCL.so, while querying the devices of a context, uses the actual implementation of the platform the context is created for. This might lead to inconsistent device orders when querying either a platform object, or a context object for a list of devices within the same application.

The whole ICD mechanism can be circumvented by looking into the *.icd to find the name of the actual OpenCL implementation, and link directly against it, e.g.

cat /etc/OpenCL/vendors/intel64.icd
/opt/intel/intel-opencl-1.2-5.0.0.43/opencl-1.2-5.0.0.43/lib64/libintelocl.so

To link directly against the library, just add the following to the CMake command line.

-DOpenCL_FOUND=True -DOpenCL_LIBRARY=/opt/intel/intel-opencl-1.2-5.0.0.43/opencl-1.2-5.0.0.43/lib64/libintelocl.so

This approach allows to install an SDK without root access, which would be needed to write into /etc/OpenCL/ during installation. The *.icd files can typically be found somewhere within the installation package.

Another way for a home-installation is using the OPENCL_VENDOR_PATH environment variable to specify an alternative to /etc/OpenCL/vendors. Sadly, this is not evaluated by all OpenCL implementations, e.g. Intel. However, the OpenCL ICD loader, i.e. libOpenCL.so can be replaced by any standard-compliant implementation. The PoCL Project provides a patch for the Khonos Group Reference ICD loader with support for OPENCL_VENDOR_PATH.

In short: Either use the CMake command line options for a direct link with a path adapted to your OpenCL installation, or use an ICD loader that supports OPENCL_VENDOR_PATH.


Problem: cl::Platform::get() error (seen on a Xeon Phi (KNL) system).

Solution: Directly link to the Intel OpenCL implementation library instead of the ICD loader.

CXX=`which CC` CC=`which cc` cmake -DCMAKE_BUILD_TYPE=Release -DHEOM_ENABLE_MPI=True -DOpenCL_FOUND=True -DOpenCL_LIBRARY=/sw/tools/opencl/intel/opencl-1.2-5.0.0.57/lib64/libintelocl.so ..

OpenCL implementations

The current version of is OpenCL 2.1 (Nov. 2015), there is a provisional OpenCL 2.2 (April 2016). What is widely supported is OpenCL 1.2 (Nov. 2011).

Some implementations are:

  • Intel, runs on Xeon and other CPUs with AVX2 support
  • AMD, FirePro GPUs and CPUs with AVX2 supprt
  • NVidia (within the CUDA SDK), support for NVidia GPUs
  • PoCL (Portable OpenCL), LLVM-based open source implementation targeting multiple platforms

See OpenCL Resources for

OpenCL and Threads

On CPU-like compute devices, OpenCL implementations spawn worker threads, if more then one MPI process per compute device is used, e.g. to improve network performance, it is crucial to make sure thread counts per process and the CPU mapping of the threads are reasonable. The thread count for each process can be found within the generated profiling data, see Built-in Profiling.