HowTo - PIK-LPJmL/LPJmL GitHub Wiki

HowTo - Getting started with LPJmL

TOC

Linux Terminal Commands

If you are working on a unix Computer, you are probably already familiar with this (skip to Download LPJmL), but as a Windows user, connecting to the cluster, you might need to familiarize yourself with some shell commands: Linux-Terminal-Commands

Download LPJmL

We currently host the LPJmL code at github. For an introduction to github, see [the git userguide](git userguide).

Prepare the model for the run

You should now have a copy of the LPJmL code. If you enter the folder (LPJROOT-folder) and list the content (ls), it should look somewhat like this:

$ ls 
AUTHORS        COPYRIGHT   input_crumonthly.conf  lpj.conf           magic.mgc   R
bin            doc         input_fms.conf         lpjml.conf         Makefile    README
config         html        input_netcdf.conf      lpjml_fms.conf     man         REFERENCES
configure.bat  include     INSTALL                lpjml_image.conf   par         src
configure.sh   input.conf  LICENSE                lpjml_netcdf.conf  param.conf  VERSION

Most of the code resides in the folder “src”. Parameters can be found in “par”. *.conf files are for configuration of the model. “bin” contains the computer-executable files, that the compiler creates.
The code is written in the programming language C (files that end with “.c”), which can be read and modified by humans. To run the program on a computer, you need to translate it to machine code, which is done with a compiler. The compiler translates the code specific to the machine, you want to run it at. Two implications of this are, that you need to recompile every time you change something in the code (except parameters, that are not hardcoded, but read at runtime) and every time that you want to run it on a new machine.

Setup

There are several dependencies on standard libraries and compiler setting, please consult the configure.sh and the Makefile templates in the folder config and adjust these to your local setup.

Next, on linux based systems, go to your LPJROOT directory. This is the folder, where your LPJmL code resides (Get the model running).
Run configure.sh which configures your Makefile.inc for your system. Remember that all compiled executables are specific to the machine on which it has been compiled.

./configure.sh

Error during configure

If configure script exits with message “Unsupported operating system”,
Makefile.$osname is created from Makefile.gcc and probably has to be
modified for your operating system/compiler.
If the configure script finds a MPI environment a parallel version of lpjml is built.
The configure script creates a copy of the following OS-specific makefiles from
directory config:

Makefile.aix         - IBM AIX settings (xlc compiler)
Makefile.aix_mpi     - IBM AIX and MPI environment
Makefile.gcc         - GNU C-compiler settings
Makefile.darwin_gcc  - GNU C-compiler settings for MacOS X
Makefile.intel       - Intel C-compiler settings
Makefile.intel_mpi   - Intel C-compiler and Intel MPI settings
Makefile.cluster2015 - Intel C-compiler and Intel MPI on HLRS2015 cluster at PIK
Makefile.mpich       - GNU C-Compiler and MPI Chameleon settings
Makefile.win32       - Windows settings (used by configure.bat)

Compilation

Compilation on Unix (Linux computer, or cluster)

Run

make

to compile just the LPJmL exe (will be stored in the bin subfolder) or

make all

to also compile all the utility programs (exes will also be stored in the bin folder), libraries from individual sub-directories will bin in the lib directory.
Run

make clean

to remove all object files and libraries if you want to have a fresh start before running make or make all
Compilation can be sped up by using multiple threads (but you should avoid using more threads than available on one node)

make -j16 all

for the cluster (which has 16 threads per node), use -j2 for a 2-core local machine etc.

Compilation on Windows computer

Please see Compilation-on-Windows

Errors during make:

  • Error 2 during “make clean”

     /bin/sh: 1: cd: can't cd to ../../lib  
    

    If you try to clean, and some folders have already been removed manually, you receive this error. To resolve, simply create the demanded folder manually: “mkdir lib” to recreate the lib folder.

  • Error during make lpjliveview

    • edit “config/Makefile.intel_mpi”
    • remove “-lxcb-xlib” in line 36: X11LIB = -L/usr/X11R6/lib64 -lX11 -lxcb -lxcb-xlib -lXau

Setup the parameters for the model

Now the model is prepared to run, but in order to run, you should modify its parameters to your needs. The three main files for this are: lpjml.conf (for the general setup - settings, or start and stop year), input.conf (setup of regional input files like climate, or landuse patterns) and param.conf (global model parameters).

Note that the model code does not include any input files like climate or landuse patterns. Unless you run LPJmL on the PIK infrastructure you will have to create all required input files yourself and update the paths to these files in the respective input.conf file.

* change lpjml.conf to specify

**** the location of parameter files and part of the input data

/*===================================================================*/
/*  II. Input parameter section                                      */
/*===================================================================*/

#include "param.conf"    /* Input parameter file */

/*===================================================================*/
/*  III. Input data section                                          */
/*===================================================================*/

#include "input_crumonthly.conf"    /* Input files of CRU dataset */

#if defined(WITH_WATERUSE) && defined(WITH_LANDUSE)
CLM2 /p/projects/lpjml/input/historical/input_VERSION2/wateruse_1900_2000.bin /* water consumption for industry,household and livestock */
#endif 

**** the names and location of output files (be sure they match the number and order of outputfiles specified in conf.h, see below for further information on output files)

/*
ID                  Fmt filename
------------------- --- ----------------------------- */
GRID                RAW output/grid.bin
FPC                 RAW output/fpc.bin
... 

**** the spin-up period (number of years the model is run towards equilibrium with constant climate)

 5010  /* spinup years */

**** the cells to be computed (“ALL”, “singlecellnumber”, or “startcellnumber endcellnumber”, where cellnumbers range from 0 to 67419)

 ALL  /* 27410 67208 60400 all grid cells */ 

**** the start/endyear of the simulation

1901 /* first year of simulation */
1901 /* last year of simulation */
 
/*===================================================================*/
/*  V. Run settings section                                          */
/*===================================================================*/

ALL  /* 27410 67208 60400 all grid cells */

#ifndef FROM_RESTART

5000  /* spinup years */
/* exclude next line in case of 0 spinup years */
30   /* cycle length during spinup (yr) */
1901 /* first year of simulation */
1901 /* last year of simulation */
NO_RESTART /* do not start from restart file */
RESTART /* create restart file: the last year of simulation=restart-year */
restart/restart_1840_nv_stdfire.lpj /* filename of restart file */
1840 /* write restart at year; exclude line in case of no restart to be written */

#else

390  /* spinup years */
/* exclude next line in case of 0 spinup years */
30 /*cycle length during spinup (yr)*/
1901 /* first year of simulation */
2011 /* last year of simulation */
RESTART /* start from restart file */
restart/restart_1840_nv_stdfire.lpj /* filename of restart file */
RESTART /* create restart file */
restart/restart_1900_crop_stdfire.lpj /* filename of restart file */
1900 /* write restart at year; exclude line in case of no restart to be written */

#endif 

**** …

Generally you need three different runs, where in some cases, run 2 and 3 can be combined.

  • potential natural vegetation (pnv) spinup: ~5000 years to fill the carbon pools (SPINUPYEARS=5000, STARTYEAR=1901, STOPYEAR=1901)
    • no landuse, no wateruse, no irrigation, no reservoirs, riverrouting enabled, no read restart, write restart, no fixed sowing dates
  • landuse spinup: 1700-simulation period start (1700-1999) - to have realistic soil property changes through past agricultural use for all landuse cells
    • irrigation, landuse, riverrouting, wateruse, reservoirs, read restart, write restart
    • if you have climate input only from 1901 or later, you have to use the spinupyears to use the the landuse input from earlier (SPINUPYEARS=201, STARTYEAR=1901, STOPYEAR=1999)
  • actual run: simulation period (e.g. 2000-2100)
    • irrigation, landuse, riverrouting, wateruse, reservoirs, read restart, no write restart

In Configuration_files you find an example of what needs to be changed in one specific case, but yours will most probably differ.

See these pages for more information: Input | Output | Parameter (these pages might be outdated, there might have been new parameters added, or changes to existing ones)

Running the model

  • Go to your LPJROOT folder.
  • First check, if your lpjml-configuration file is consistent, and all files are present - lpj does that for you. ./bin/lpjcheck lpjml.conf
  • If you did not create an output folder yet, you can do that via mkdir: for the spinup run mkdir restart and for the transient (main) run mkdir output. If you want specific folder names e.g. for multiple runs, you can insert #define output my_output_folder in lpjml.conf in the definitions at the top.

* On the cluster load the lpjml module (if not already loaded) - on other machines make sure the necessary tools are available.

module load lpjml

* Source the lpj_paths file:

. ./bin/lpj_paths.sh

(short for “source ./bin/lpj_paths.sh”) This will make the paths of your LPJ folder globally available on the machine. Don’t forget the first dot!

**** If lpjml is started from a different directory than the root directory, omit sourcing the lpj_paths file, and set environment variable LPJROOT manually:

 export LPJROOT=<lpjml root directory> 
  • Submit the run to the cluster-slurm-queue:

    ./bin/lpjsubmit_slurm -group grpname -blocking 16 ntasks [-DFROM_RESTART] lpjml.conf 
    
    • grpname might be open/macmit/biodiv
    • blocking 16 reserves whole nodes, which is more stable, but takes longer to start the run - otherwise the tasks are possible spread over many nodes.
    • ntasks can be a multiple of 2: 64,128,256 … The more tasks you use, the faster it will run (parallelization), but it will take more time to start, because you request more resources. 100 years with 256 tasks takes approx. 15 min (Jan2018).
    • if you know the job will not take much time it is useful to limit the processing time for example by adding -wtime 5:00:00
      • this will sometimes speed up the waiting time :-)
    • if you want to start from a restartfile, include the "-DFROM_RESTART part, otherwise omit it

More information on how to run the model can be found here: running_lpjml

Speed up queuing time

The slurm management system calculates a priority rating for your submission. The time, you have to wait until your run is starting, depends on the current workload of the cluster, the amount and rating of jobs already submitted, and the rating of your run. The better your rating is, the faster you can start. But how can you influence your priority rating score?

  • You can reduce waiting time, by disabling blocking: remove -blocking 16, your tasks will now be possibly spread over many nodes, which allows them to start faster but can result in the run taking longer. The variation in runtime is also increased.
  • Setting ntasks to a lower value lets the run start faster, but run longer, you should find a suitable setup for you.
  • Find a -wtime which allows the simulation to run through within the time limit (mind the variation due to non-blocking) but be as short as possible, to start earlier. Caution: if you are too greedy/optimistic, your run is cancelled and you have to queue in again, which probably won’t be worth it!
  • Choose the right partition! Sometimes one partition (e.g. “ram_gpu”) has a lower workload than “standard” or “broadwell”. You can check the load of a partition with the “sinfo” command ([[Linux-Terminal-Commands#Slurm-management-system|Slurm Management System]]). If there are more idle nodes than you need on a partition - go! Otherwise (if there are mixed nodes) you might consider deactivate blocking.

Error-messages via email

By the time, the simulation finishes, the slurm-submission script is configured to send you an email. You can see what the status of the run was, when it stopped (COMPLETED, CANCELLED, FAILED, OUT_OF_MEMORY).
For more information, check the error and output files lpjml.%i.err and lpjml.%i.out for hints on what went wrong.

Some ideas:

  • Are the input files compliant with your start-end date?
  • balanceW/C Error:
    • Water/Carbon balance is not correct -> threshold in src/lpj/check_fluxes.c:138 can be adjusted
    • better solve the problem, then omit the error, by commenting out the line or increasing the threshold !

analyze output

By default, the output files are saved to the “output” directory in your lpj folder. They are saved as plain binary data, meaning, that you cannot look at them directly, but that you need a tool to extract the values. The outputfiles do not contain any header, the order in which the outputvariables are (for some outfiles) in can be found in Output.

outfile

As a rough first check, if everything worked properly, you can have a look at the *.out file that is created during the run. If you have not changed settings, it will reside in the LPJROOT directory.
First there is a long list of Parameters and Settings, and then it displays for every modelled year, important global carbon and water fluxes.

First year:                    2000
Last year:                     2109
Number of grid cells:         67420
==============================================================================
Simulation begins...

                  Carbon flux (GtC)                                Water (km3)
       --------------------------------------- ---------------------------------------------
Year   NEP     estab   fire    harvest total   transp     evap    interc  wd      discharge
------ ------- ------- ------- ------- ------- ---------- ------- ------- ------- ----------
  2000  18.798   0.217   3.206  12.473   3.336    43448.8 13646.3  7618.8   981.8    58945.5
  2001  16.248   0.218   3.487  11.756   1.223    42669.1 13180.4  7334.2   989.8    55353.3 

Flags for running LPJmL

switch flags by calling lpjml -DFLAG, e.g lpjml -DISRANDOM

  • FROM_RESTART: A restart-file can be generated at the end of a LPJ run. [Setup the parameters for the model](HowTo#Setup the parameters for the model)
    Set this flag to restart from this point (which might be useful e.g. to start multiple runs from the same spin-up run)

Compiler flags

Compilation of LPJmL is customized by definition of macros in the LPJFLAGS
section of “Makefile.inc”. You have to call “make clean” and “make all” to recompile afterwards

LPJFLAGS= -Dflag1 ...

Flag                Description
------------------- ------------------------------------------------------------
COUPLING_WITH_FMS   enable coupling to FMS
DAILY_ESTABLISHMENT Enable daily establishment 
DEBUG               diagnostic output is generated for debugging purposes
DOUBLE_HARVEST      adding correct sequencing of harvest events
IMAGE               include coupler to IMAGE model
LINEAR_DECAY        use linearized functions for litter decay
MICRO_HEATING       Enable microbial heating
SAFE                code is compiled with additional checks
STORECLIMATE        store climate data in memory for spin up phase
USE_MPI             compile parallel version of LPJmL
USE_NETCDF          enable NetCDF input/output
USE_NETCDF4         enable NetCDF version 4 input/output
USE_RAND48          use drand48() random number generator
USE_UDUNITS         enable unit conversion in NetCDF files
WITH_FPE            floating point exceptions are enabled for debugging purposes
LANDUSE             includes all land-use specific routines, especially the memory allocation to the cropdates and the landuse structure (only needed in the LPJ_SPEEDY branch)
NEW_GRASS           implementation of grasses (e.g. for bioenergy-grasses) - only for LPJmL 3.X - for LPJmL 4.0 and later it is part of the standard config
------------------- ------------------------------------------------------------

EFRs

To run with EFRs, …

Debugging

See Debugging

documentation / code

See model documentation for navigating through the model structure.

Check out the LPJmL4 model documentation publications for more details:
http://dx.doi.org/10.5194/gmd-11-1343-2018 http://dx.doi.org/10.5194/gmd-11-1377-2018