Debugging - PIK-LPJmL/LPJmL GitHub Wiki
[[TOC]]
If you are contributing to the code, sometimes errors only show up at runtime. To debug this, you need specific software.
You can either use the command line or a debugger like GDB (ddd with GUI) to compile and debug LPJmL.
Some preconditions:
* tell the system to generate core files. For jobs submitted via
lpjsubmit this is automatically set, but for jobs run locally,
you want to make sure that ulimit -c
is set to
unlimited
. Either include this line into your
.profile
ulimit -c unlimited
or simply execute this command before running the model.
* testing for floating point exceptions, i.e. enable the runtime
check for devisions by zero, by including the flag
-fpe
in the list of arguments, e.g.
lpjml -DFROM_RESTART -fpe lpjml.conf
On the command line in your $LPJROOT directory, use
make clean
to remove any older executable files, then type
make all
to build new executables. The Makefiles for each sub-directory are already downloaded from the SVN LPJmL trunk. You do not have to write them yourself.
On your local drive, you may also use Eclipse to compile and debug the code. See here how to configure Eclipse.
For debugging, you’ll need to set different compiler sets, so call the configure script with the -debug flag to adjust the Makefile accordingly
configure.sh -debug
or
configure.bat -debug
call
gdb <YOUREXE> <YOURCOREFILE>
to read the core file. Obviously, this only works if you’re using the
executable which was also used to generate the
.
With
where
, gdb will trace the origin of the error for you.
Since the update of the cluster in early 2017, gdb may report some
error message about missing “additional debug info”. Ciaron and Roger
installed a version of gdb that doesn’t have that problem, which you can
use by calling
gdb <YOUREXE> <YOURCOREFILE>
.
Make sure you’ve asked to generate core files (see above), if unsure,
check ulimit
settings with
ulimit -a
Since the update of the cluster in early 2017, gdb
cannot handle
self-written signal handler, please remove the line
source:/trunk/src/tools/enablefpe.c#L55
If LPJmL internal fail
commands do not produce proper core files that
can be interpreted by gdb
, try replacing the abort
command
source:trunk/src/tools/fail.c with a deliberate segmentation fault.
(e.g. define int *ptr=NULL;
and assign a
value to it instead of calling abort
, e.g.
*ptr=0;
Attention: the MPI causes a lot of warning and failure messages wit valgrind. To test your code, it is advisable to not only enable MPI usage on the login nodes via
unset I_MPI_DAPL_UD_PROVIDER
but to also compile the code without MPI features by removing
-DUSE_MPI
from the Makefile (Makefile.inc
directly or
config/Makefile.cluster2015
and then run
./configure.sh -debug
again
Then call
make clean
make all
For valgrind debugging, you need to load the valgrind module
module load valgrind
and then call
valgrind --leak-check=full --log-file=YOURLOGFILENAME YOURPROG YOURARGS
which could be e.g.
valgrind --leak-check=full --log-file=val.log lpjml -DFROM_RESTART lpjml.conf
for more information on valgrind see http://valgrind.org/ and
valgrind --help
Valgrind also detects memory leakages (reading/writing out of array bounds — a very nasty bug to find). It’s a command line tool, so it needs some arguments to do what you want it to do.
Microsoft Visual Studio Debugger (or Eclipse) allows you to run the program step-by-step and to look into functions sequence and variables content.
- you can set breakpoints at any line of the source code where you want the debugger to suspend the execution of the program
- to set a breakpoint on a line of code click in the left margin in the text editor, a red dot should appear
- then start debugging (F5)
- the program will run through all steps before the breakpoint line and you will be able to look at its status at that point