Issues - N-BodyShop/changa GitHub Wiki

Issues with using ChaNGa and how to address them are listed here.

If you have issues not addressed here, please report them as an issue on the changa github page.

Table of Contents Compiler Problems Problems on Startup: understanding charmrun Runtime Asserts and Aborts Out of Memory Deadlocks CUDA Specific Issues

Compiler Problems

There is an issue with GCC 6.[12].X and Charm++, evidently an over-optimization that results in a crash immediatly after reading in the particles. To work around this, either use an earlier compiler version, or add -fno-lifetime-dse to the charm build command. See https://charm.cs.illinois.edu/redmine/issues/1045 for more details.

Problems on Startup: understanding charmrun

For the "net" builds of charm++/ChaNGa, the common problem is starting ChaNGa on multiple nodes of your compute cluster. For MPI and other builds, this is taken care of by the cluster infrastructure, but for net builds, you are directly facing this problem.

"charmrun", which gets built when you "make" ChaNGa, is the program that handles this. If your cluster does have MPI installed, the easiest way to start things up is with

charmrun +p<procs> ++mpiexec ChaNGa cosmo.param

However, if your "mpiexec" is not the way you start an MPI program on your cluster, then you may need to write a wrapper. E.g. for the TACC clusters (stampede and lonestar) a wrapper would contain:

#!/bin/csh
shift; shift; exec ibrun $*

and you would call it with:

charmrun +p<procs> ++mpiexec ++remote-shell mympiexec ChaNGa cosmo.param

If MPI is not available, then charmrun will look at a nodelist file which has the format:

group main
  host node1
  host node2

In order for this to work, you need to be able to ssh into those nodes without a password. If your cluster is not set up to enable this by default, set up passwordless login using public keys. If you can interactive access to the compute nodes (e.g. with qsub -I) then a quick way to test this within the interactive session is to execute the command ssh node1 $PWD/ChaNGa. If ChaNGa starts and gives a help message, then things are set up correctly. Otherwise the error message can help you diagnose the problem. Potential problems include: host keys not installed, user public keys not installed, and shared libraries not accessible.

Runtime Asserts and Aborts

Some messages are extraneous. One example is:

   Warning> Randomization of virtual memory (ASLR) is turned on in the kernel, thread migration may not work!
 Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try running with '+isomalloc_sync'.

You do not need to add +isomalloc_sync to your command line; ChaNGa handles thread migration in another way.

There are many sanity checks within the code using the assert() call. Here are some common ones with explanations of what has gone wrong.

   Assertion "bInBox" failed in file TreePiece.cpp line 622

This happens when running with periodic boundary conditions and a particle is WAY outside the fiducial box. This is an indication of bad initial conditions or "superluminal" velocities.

    ------------- Processor 0 Exiting: Called CmiAbort ------------
    Reason: SFC Domain decomposition has not converged

Here domain decomposition has failed to divide the particles evenly among the domains to within a reasonable tolerance. This could be due to a pathological particle distribution, such as having all particles on top of each other. One solution is to loosen the tolerance by increasing the "ddTolerance" constant in ParallelGravity.h and recompile. If the above message is also accompanied with many messages like:

    Truncated tree with 17 particle bucket
    Truncated tree with 26 particle bucket

then larger sorting keys may be needed. Try running configure with "--enable-bigkeys", and recompiling.

    ------------- Processor 0 Exiting: Called CmiAbort ------------
    Reason: [CkIO] llapi_file_get_stripe error

This is a recent (2019) error on Pleiades with a newer implementation of the Lustre file system. The "stripe" refers to how a file is split across many disks for high I/O performance. The work around (until the Charm interface to Lustre catches up with the newer Lustre API) is to explicitly set the striping on the directory in which the snapshots are being written. An example command is lfs setstripe -S 1048576 -c 4 . The final "." refers to the current directory, so this command should be run in the directory in which the snapshots are written. Update: March 1, 2019: this problem is now appearing on Blue Waters and Stampede2. The same work around is applicable on these systems. Meanwhile, the charm development team is working on a true fix.

    ------------- Processor 0 Exiting: Called CmiAbort ------------
    Reason: starlog file format mismatch

The starlog file starts with a number that is the size of each starlog event. ChaNGa checks this number against what it thinks the starlog event size is and issues this complaint if they don't match. The two obvious reasons for a mismatch are: 1) the starlog file is corrupt or 2) ChaNGa has been recompiled with a different configuration (e.g. H2 cooling vs no H2 cooling) in the middle of a run.

In either case the quickest way to get going again is to move the starlog file out of the way, and restart from an output.

Out of Memory

Memory use can be an issue in large simulations. One of the current big uses of memory in ChaNGa is the caching of off-processor data. This can be lowered by decreasing the depth of the cache "lines" with "-d" or "nCacheDepth". The default is 4, and size of a line scales as 2^d. Higher values mean more remote data is fetched at once, reducing latency costs at the price of higher memory use.

Deadlocks

Deadlocks are hard to track down. One common deadlock is that a process gets held up in a lock within malloc() or free(). This will happen if you link with "-memory os" instead of using charm++ default memory allocator and the os malloc is not thread safe.

CUDA Specific Issues

The CUDA implementation is still experimental. Fatal CUDA Error all CUDA-capable devices are busy or unavailable at cuda-hybrid-api.cu:571. This means 1) there are no GPUs on the host, or 2) more than one process is trying to access the GPU. For scenario 2, you might have more than one ChaNGa process on the host competing for the GPU. Either run in SMP mode with only one process per GPU host, or use the CUDA Multi Process Service (CUDA_MPS) to handle this situation. For Cray machines, setting the environment variable CRAY_CUDA_MPS=1 enables this. However, many compute clusters do not support this.