Practical information - GLOBIO4/GlobioModelPublic GitHub Wiki

Coordinate reference system

For all calculations the WGS84 coordinate reference system is used.

Geographic extent and cellsize

GLOBIO 4 uses different kind of input raster datasets to calculate the final raster results and the raster datasets are combined on cell base. To combine these rasters the raster cells need to be aligned and have the same size and the raster should have the same size, i.e. the same number of rows and columns.

When input rasters are used with different extents or cellsizes GLOBIO resamples and/or resizes the datasets. At this moment resampling is limited. Resampling is only possible when the new cellsize is a multiple of the original cellsize.

Resampling and resizing takes time, so to reduce computation time it is recommended to use as much as possible input data with the same cellsize and geographical extent.

In GLOBIO 4 computations are done with the following default geographic extents:

  Name     Extent (minx, miny, maxx, maxy) in degrees
  -------- ------------------------------------------
  world    -180, -90, 180, 90
  wrld     -180, -90, 180, 90
  europe   -25, 33, 45, 72
  eu       -25, 33, 45, 72
  nl       3, 50, 8, 54

The following cellsizes are defined:

  Name    Cellsize in degrees
  ------- -------------------
  10deg   10
  1deg    1
  30min   0.5
  5min    0.0833333333333
  30sec   0.00833333333333
  10sec   0.00277777777778

These defaults are defined in the GLOBIO configuration file and their name can be a constant in configuration scripts. If needed these default values can be changed by modifying the GLOBIO configuration.

Numbers and decimal points

In configuration files always use a "." as decimal separator for floating point values. Do not use another regional defined decimal separator like ",".

In data files and lookup files both a "." and a "," can be used. They will be converted to the proper internal used separator.

Using lookup files

In previous versions of GLOBIO many relationships between data classes (landuse, biomes, etc.) and corresponding properties (MSA, MSA loss, etc.) or reclassifications (land-use to natural land-use) were hard coded in the source code. Using different relationships or reclassifications was rather complicated and time consuming.

In GLOBIO 4 all relationships and reclassifications are put in separate look-up files. An example of the lookup file for the relation between biomes and MSA loss is shown below.

BIOME;MSALOSS;DESCRIPTION;AGGRBIOME1;AGGRBIOME2
7;0,0356;Ice;5;1
8;0,0426;Tundra;5;1
9;0,0426;Wooded tundra;5;1
10;0,0367;Boreal forest;4;4
11;0,1127;Cool coniferous forest;4;4
12;0,0487;Temperate mixed forest;4;5
13;0,0710;Temperate deciduous forest;4;5
14;0,1457;Warm mixed forest;4;5
15;0,1201;Grassland and steppe;3;2
16;0,1201;Hot desert;5;7
17;0,0661;Scrubland;1;3
18;0,0775;Savanna;1;3
19;0,1075;Tropical woodland;2;6
20;0,1075;Tropical forest;2;6
21;0,0661;Mediterranean shrub;4;3

The lines of a lookup file should always contain at least two fields: the first field is the key value, the second field is the target value. Additional fields may exist, but will be ignored (except when having an n-to-one relationship lookup file, see below). In the processed dataset the key values are looked up and replaced with the target values in the output dataset.

Lookup files are ASCII files and should meet the following requirements.

the first line should contain the names of the fields (for documentation);
there should be at least two fields; extra fields are ignored;
fields should be separated by a ";".

It is possible to specify just one or a couple of key values for assigning specific values and assign an other value to all other key values. For this use "*" as a key value.

Example:

LANDCOVER;NOTALLOCATABLE
*;0
0;1
210;1
220;1

Lookup files can be created with a text editor. Also Excel or LibreOffice Calc sheets can be easily exported to the required format. Do "Save as", choose "As type: Text CSV", use ";" as field delimiter and "Save cell content as shown".

In floating point numbers both a "." and a "," can be used as decimal separator. They will be converted to the proper internal used separator.

The next example shows the contents of a lookup file with an n-to-one relationship between landuse and biomes, and the Landuse MSA. In n-to-one lookup files the target value is dependent of two key values.

LANDUSE;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21
1;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
2;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
3;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
4;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
5;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
6;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
7;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
8;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
9;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
10;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300
11;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
12;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
13;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
14;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
15;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
16;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300;0,300
17;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
18;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
19;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
20;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
21;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000;1,000
22;0,050;0,050;0,050;0,050;0,050;0,050;0,050;0,050;0,050;0,050;0,050;0,050;0,050;0,050;0,050
23;0,000;0,000;0,000;0,000;0,000;0,000;0,000;0,000;0,000;0,000;0,000;0,000;0,000;0,000;0,000

The contents of a two-to-one lookup file is a sort of pivot table with the first key as vertical axis and the second key as horizontal axis.

Two-to-one lookup files should meet the following requirements.

The first line should contain the name of the first key value and additional the values of the second key.
The following lines should contain the first key value followed by the target values for all second key values.
No extra fields are allowed.
Fields should be separated by a ";".

The lookup file above with the relationship between landuse and biomes, and the Landuse MSA is a dummy file in which the Landuse MSA does not vary over the biomes. The Landuse MSA varies only over the landuse. So, also a simple one-to-one lookup file will do. Both lookup files give the same result. However using the two-to-one file takes much more execution time (in a test 20 times more).

To be able to use lookup files for all MSA impact values also lookup files with one value is supported. The next example shows the contents of a lookup file with only one value.

MSA
0,78

These lookup files should meet the following requirements.

The first line should contain the name of the value.
The following line should contain the value (no key value is needed).
No extra fields are allowed.

Fragmentation by infrastructure

To calculate the impact of fragmentation by infrastructure highways, primary and secondary roads are selected. Using this network of roads patches of natural land-use are determined. The area of these patches are used to reclass and assign the final MSA impact.

To create these patches adjacent raster cells with natural land-use are grouped together. Raster cells are adjacent when they touch each other horizontally or vertically. While creating the patches the selected roads are used as barriers between patches. Horizontal and vertical "pass through" will result in splitted patches.

To implement this calculation for road networks with vertical passes a method is used to temporary enhance the raster resolution and perform a kind of interpolation. The result is a temporary road network with no vertical passes anymore. This network is used to calculate the natural land-use patches.

For small rasters this method works. Due to a bug in the Numpy library this method crashes when using big rasters.

As workaround a method is implemented which copies all road raster cells one raster cell down. This also results in a road network with no vertical passes. The disadvantage of this method compared to the former method is that the area of natural land-use is underestimated. Underestimation is less than using a road buffer to close the passes.

In practice the roads dataset does not contain a network with vertical passes. When calling the fragmentation by infrastructure calculation the flag CloseRoadConnections can be use to enable or disable closing passes in the road network.

Working with big rasters

In GLOBIO in most cases calculations are made on global scale. Previous versions of GLOBIO uses a resolution of 30 arc minutes and land-use fractions. In GLOBIO 4 it's possible to use high resolution input data. On global scale the impact calculations are successfully tested with a raster resolution of 30 arc seconds.

When programming GLOBIO 4 special attention has been paid to deal with these high resolution rasters. The following measures have been taken:

using rasters with byte data types when possible; this is only possible for numbers (class ids) with values from 0 to 255 and not with floating point values;
release a raster from memory as soon as the data isn't needed any more;
do in memory calculations using +=, -= and *= operators.

Calculations using raster data with a resolution of 10 arc seconds is only possible on computers with at least 256 GB of RAM. Performing calculations on a regional scale with a resolution of 10 arc seconds is no problem on more regular computers. Impact calculations for Europe where successfully tested with a raster resolution of 10 arc seconds.

Using a global scale raster and WGS84 as coordinate reference system the following quantities of cells needs to be managed.

  Resolution   Degrees   Meter*   Number of cells    Total number of cells
  ------------ --------- -------- ------------------ ---------------------
  30 minutes   0.5       55,590   720 x 360          259,200
  5 minutes    0.0833    9,265    4,320 x 2,160      9,331,200
  30 seconds   0.00833   927      43,200 x 21,600    933,120,000
  10 seconds   0.00277   308      129,600 x 64,800   8,398,080,000

  *) On the equator: 1 minute = 1853 meter.

In GLOBIO 4 rasters need to be read in memory in one piece. For reading rasters in Python various data types can be used. Most relevant are:

  Data type   Number of bytes   Value range
  ----------- ----------------- ---------------------------------
  byte        1                 0 to 255
  word        2                 0 to 65,535
  integer     4                 -2,147,483,648 tot 2,147,483,647
  float       8                 -3.4e38 to 3.4e38

This will result in the following overview of memory (RAM) needed to load a raster.

  Resolution   Data type   Memory (GB)
  ------------ ----------- -----------
  30 minutes   byte        0.00025
  30 minutes   word        0.0005
  30 minutes   integer     0.001
  30 minutes   float       0.002
  5 minutes    byte        0.009
  5 minutes    word        0.018
  5 minutes    integer     0.036
  5 minutes    float       0.073
  30 seconds   byte        0.911
  30 seconds   word        1.822
  30 seconds   integer     3.645
  30 seconds   float       7.290
  10 seconds   byte        8.201
  10 seconds   word        16.402
  10 seconds   integer     32.805
  10 seconds   float       65.610

A regular Windows 64-bit PC can address a maximum of 64 GB. The current version of GLOBIO uses data types varying from byte to float. So, working with rasters with a resolution of 10 arc seconds at a global scale is very hard to do.

A solution could be to read parts of a raster from disk in memory instead of reading a whole raster in memory. Unfortunately some impact calculations uses zonal algorithms (like infrastructure fragmentation) which should be applied to the whole raster. So this solution would not work.

Solutions which could work are:

use special computers with a lot of memory;
use an operating system like some Linux distributions which can address more than 64 GB memory;
write your own low level software to do the calculations on disk;
use an intermediate word data type for all integer and float rasters.

For this last solution the impact calculations of the current GLOBIO version should be modified.

Convert ESRI grids to TIF rasters

To prevent memory problems when using ESRI grids as input datasets during GLOBIO runs it is recommended to convert these rasters to TIF rasters first. If possible use the data type byte.

Performance and memory usage

Testing the impact calculations of the current GLOBIO version has been done in a VMware Virtual Machine (VM) with 24 GB of memory. The host was a regular PC.

The following durations of a full run are measured for tests in the VM.

  Resolution   Region   Duration
  ------------ -------- -------------
  30 seconds   Europe   1 min 54 sec
  30 seconds   World    19 min 31 sec

During these tests no temporary data was saved. These execution times did not urge to put effort into designing parallel algorithms. It seems parallel algorithms are already used internally by the Numpy and SciPy libraries.

When running GLOBIO 4 see the following suggestions:

use a PC with multiple cores;
minimize other memory consuming tasks on the PC.

Caution, no checks are done to see if there is enough memory during execution.