02. GAEZ input datasets - un-fao/gaezv5 GitHub Wiki

Climate data

GAEZ v5 uses daily data of six climate attributes describing weather conditions for past (1981-2020) and future conditions (2021-2100). The extensive climate database was derived from Copernicus Climate Chance Service (C3S) AgERA5 data (C3S, 2020) and the bias-corrected climate scenario data of CMIP6 output provided by the inter-sectoral impact model intercomparison project (ISIMIP) (Hempel et al., 2013; Warszawski et al., 2014).

Observed climate

Time series data were used for the Global Agro-Ecological Zones historical assessment with input data obtained from the Copernicus Climate Change Service (C3S), which provides ECMWF (European Centre for Medium-Range Weather Forecasts (ECMWF) is an independent intergovernmental organisation supported by 35 states; see https://www.ecmwf.int/) reanalysis for historic global climates. Reanalysis integrates observations across the world with model data, resulting in a comprehensive and consistent dataset based on the laws of physics. This methodology enables the generation of observed gridded climate data. GAEZv5 uses the most recent ERA5 global reanalysis data (Hersbach et al., 2020).

Using the hourly ERA5 data, C3S generated the AgERA5 dataset (Boogaard et al., 2020; C3S, 2020), which provides daily surface meteorological data from 1979 to present, specifically tailored for agriculture and agro-ecological studies. AgERA5 data were aggregated to daily time steps and corrected towards a finer topography at a 0.1° (6 arc-minute) spatial resolution. AgERA5 variables used in GAEZ v5 include daily data for minimum and maximum temperature, solar radiation, vapor pressure, wind speed and precipitation.

With these updated climate databases, historical year-by-year climatic data analysis was extended from year 2010 (as used in GAEZ v4) to 2020 for GAEZ v5. Time series data were combined to compile two average 20-year historical data sets for the periods 1981–2000 and 2001–2020 and to compute raster data with related statistics of medians, standard deviations and coefficients of variation.

Six variables with daily climatic data are employed in GAEZ climate analysis and crop biomass/yield estimation, as shown in the table below. AgERA5 climatic surfaces were interpolated at IIASA to a 5 arc-minutes grid (about 9 x 9 km at the equator) for all years between 1960 and 2020. For all variables a bilinear interpolation method was applied. For temperature, a correction for altitude was included in the interpolation using a lapse rate of 0.55⁰C per 100-meter elevation together with the respective digital elevation data at 6 arc-minutes (for input data) and 5 arc-minutes (for derived GAEZ climate layer). First, 6 arc-minutes elevation data (provided by Copernicus together with AgERA5) were used to calculate temperature values adjusted to sea level. Second, bilinear interpolation was performed for temperatures at sea level. Third, 5 arc-minutes elevation data, derived from the ALOS global digital surface model (Caglar et al., 2018), was used to calculate temperatures at the median altitude of each 5 arc-minutes grid cell, compiled from detailed ALOS 1 arc-seconds (about 30 m at the equator) elevations.

Table: Base period climatic input variables used in GAEZ v5.

Variable Units Comment
Minimum temperature °C 2m above surface, 24 hours minimum
Maximum temperature °C 2m above surface, 24 hours maximum
Sunshine fraction % Converted to solar radiation (W/m²)
Wind speed m/s 2m above surface
Relative humidity %
Precipitation mm The sum of total precipitation flux within 24h

See GAEZ v5 dataset on average annual precipitation (mm) on the FAO Agro-Informatic Data Catalog (link).

Climate scenarios

The Scenario Model Intercomparison Project (ScenarioMIP) of the international Coupled Model Intercomparison Project 6 (CMIP6) has developed a new set of climate scenarios or the 21st century. The new scenarios represent combinations of different socio-economic developments as well as different pathways of atmospheric greenhouse gas (GHG) concentrations. Representative Concentration Pathways (RCPs) (Van Vuuren et al., 2011) are a set of GHG concentration trajectories developed for the climate modeling community as a basis for long-term and near-term modeling experiments adopted by the International Panel on Climate Change (IPCC). Narratives of socio-economic developments have been developed for the Shared Socioeconomic Pathways (SSPs) (O’Neill et al., 2017). These descriptions of alternative futures of societal development span a range of possible worlds that stretch along two climate-change-related dimensions: mitigation and adaptation challenges. The SSPs reflect five different developments of the world that are characterized by varying levels of global challenges, see (Riahi et al., 2017) for an overview.

Using a predefined subset of these scenarios, climate research institutes all over the world have performed climate change simulations for CMIP6 to serve as a basis for the sixth assessment report of the Intergovernmental Panel on Climate Change (IPCC) (IPCC, 2023).

Unlike the original RCPs used in CMIP5, the new SSP-based scenarios provide economic and social reasons for the assumed emission pathways and changes in land use. The denomination of individual scenarios comprises the name of the basic socioeconomic pathway, followed by two numerals indicating the additional radiative forcing (or ‘heating effect’ caused by GHG in the atmosphere) achieved by the year 2100 (measured in Wats per square meter), with five scenarios defined as follows:

SSP5-RCP8.5, termed ‘Fossil-Fueled Development’: With an additional radiative forcing of 8.5 W/m² by the year 2100, this scenario represents the upper boundary of the range of scenarios described in the literature. It can be understood as an update of the CMIP5 scenario RCP8.5, now combined with socioeconomic reasons.

SSP3-RCP7.0, ‘Regional Rivalry’: With 7 W/m² by the year 2100, this scenario is in the upper-middle part of the full range of scenarios. It was newly introduced after the RCP scenarios, closing the gap between RCP6.0 and RCP8.5. This scenario envisions a world characterized by resurgent nationalism, regional conflicts, and a focus on domestic issues, leading to fragmented and uneven development.

SSP2-RCP4.5, ‘Middle of the Road’: As an update to scenario RCP4.5, SSP245 with an additional radiative forcing of 4.5 W/m² by the year 2100 represents the medium pathway of future greenhouse gas emissions. This scenario assumes that climate protection measures are being taken.

SSP1-RCP2.6, ‘Sustainability’: This scenario with 2.6 W/m² by the year 2100 is a remake of the optimistic scenario RCP2.6 and was designed with the aim of simulating a development that is compatible with the 2°C target. This scenario assumes that effective climate protection measures are being taken.

SSP1-RCP1.9: This scenarios with 1.9 W/m² by the year 2100 represents an even more stringent sustainability scenarios of a world focused on sustainability and compliant with the Paris Agreement. It aims for very low greenhouse gas emissions and limits global warming to 1.5°C.

For GAEZ v5 calculations used in this study, the bias-corrected CMIP6 climate forcing is used provided in ISIMIP3b for historical and future conditions. ISIMIP3b provides three future scenarios, SSP1-RCP2.6 (ssp126), SSP3-RCP7.0 (ssp370) and SSP5-RCP8.5 (ssp585), to focus on a range of potential climate futures that cover low, medium, and high greenhouse gas emissions pathways.

Following a performance assessment in the historical period and considering completeness of data provided from climate simulations both for land and ocean, five models were selected as primary input data: GFDL-ESM4, IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0 and UKESM1-0-LL (Lange, 2019, 2021b). The five climate models are considered a good choice because they are structurally independent in terms of their ocean and atmosphere model components and because their process representation is considered by experts to be fair (IPSL-CM6A-LR, MPI-ESM1-2-HR) to good (GFDL-ESM4, MRI-ESM2-0, UKESM1-0-LL). In terms of climate sensitivity, the five primary models are good representatives of the whole CMIP6 ensemble as they include three models with low climate sensitivity (GFDL-ESM4, MPI-ESM1-2-HR, MRI-ESM2-0) and two models with high climate sensitivity (IPSL-CM6A-LR, UKESM1-0-LL). These three scenarios and 5 climate model projections replace the CMIP5 climate scenarios assessed in GAEZ v4.

ISIMIP3b provides bias-corrected CMIP6 climate forcing for pre-industrial, historical, and three future scenario conditions. The new bias-adjustment corrects the simulated data towards corrected ERA5 observational data (W5E5) (Lange, 2021a). Bias-corrected ISIMIP data at half-degree resolution of five climate models and three scenarios - totaling 15 combinations of scenario and climate models - were used to generate time series of climate input data in GAEZ v5 covering the period 2021 to 2099 and for compiling 20-year average climate attributes across four time periods: 2021–2040 (2030s), 2041–2060 (2050s), 2061–2080 (2070s), and 2081–2100 (2090s).

These GAEZ v5 future climates use the monthly climate signal of the ISIMIP3b climate models and connect it to the historic AgERA5 data as follows: First, the coarser spatial resolution of ISIMIP3b climates is interpolated to 5 arc-minutes as described above for historic conditions. Second, for GAEZ v5, the difference between monthly historical (1981-2000) and future scenario conditions of the ISIMIP climate models is calculated and the delta added to the historic AgERA5 data. Third, the daily distribution of each climate variable in the ISIMIP3b climate models was applied to the monthly climate signal. In the case of precipitation, the future monthly amount of precipitation is evenly distributed over the number of rainy days shown in the climate model. In this way GAEZ v5 generates future daily climates, which represent the climate signal of ISIMIP3b climate models but remain comparable to the historic ‘observed’ climate of AgERA5.

Use of climate data in GAEZ

The 20-year average climate and year-by-year time series databases for the period 1981–2099 were used to quantify:

  • Agro-climatic indicators, such as the number of growing period days, thermal climate classification, moisture availability indices, net primary production, etc.;
  • By crop/LUT agro-climatic potential crop yields, variability and related (yield optimizing) crop calendars and crop water requirements/deficits, and
  • Ensemble mean data sets of agro-climatic indicators and potential crop yields by three scenarios SSP-RCP scenario (ssp126, ssp370, and ssp585) and four future 20–year periods.

Soil and terrain data

GAEZ v5 includes an inventory of soil and terrain resources. Data are stored at a resolution of 30 arc-seconds, which represents the finest unit of analysis used in the global assessment.

Soil resources data

GAEZ v5 uses the Harmonized World Soil Database (HWSD v2.01) (FAO/IIASA, 2023) as source of soil resources data for spatially detailed evaluation of soil qualities and edaphic crop suitability. The HWSD v2.0 is composed of a global level geographical layer containing reference to more than 29,000 map units (compared to over 16,000 map units in HWSD v1.2 used in GAEZ v4). The HWSD v2.0 attribute database provides information on the soil unit composition for each of the 29 385 soil association mapping units. Soil attribute data is provided for seven depth layers as available from WISE30sec (Batjes, 2015), namely 0–20 cm, 20–40 cm, 40–60 cm, 60–80 cm, 80–100 cm, 100–150 cm and 150–200 (compared to two layers, 0-30 cm and 30-100 cm in HWSD v1.2).

See HWSD v2.0 dataset on the FAO Agro-Informatic Data Catalog (link).

The HWSD soil information is stored as a 30 arc-second soil map unit raster in GIS, linked to an attribute database stored in MS-Access format. Each HWSD v2 record indicates soil type and soil phase information and includes 17 soil characteristics, each for seven soil layers. For use in GAEZ v5, the procedures for calculating water holding capacity of soils have been enhanced (see Chapter 6, section 6.5). Procedures for dealing with soil phases in cropland have been revisited and revised

Elevation and terrain-slope data

The altitude and terrain slope database have been compiled using elevation data from the ALOS high-resolution digital elevation model (DEM) data, known as the ALOS World 3D (AW3D30), produced by the Japan Aerospace Exploration Agency (JAXA). The AW3D30 dataset (Caglar et al., 2018) is a global digital surface model (DSM) with a horizontal resolution of 1-arc second (approximately 30 meters), created using optical images from the Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM) on board of the Advanced Land Observing Satellite (ALOS). The elevation and terrain slope database comprise of the following elements:

  • Elevation (m) by 1 arc-second grid-cells and related median altitude calculated for each 30 arc-second grid cell and 5 arc-minute grid cell of the GAEZ v5 inventory, and
  • Terrain slopes (%) calculated at 1 arc-seconds and grouped into ten slope gradient classes of respectively 0–0.5%, 0.5–2%, 2–5%, 5–8%, 8–12%, 12–16%, 16–24%, 24–30%, 30–45%, and >45%.

Note, the compilation of terrain slopes from 1 arc-second ALOS data results for each 30 arc-second grid cells in a distribution of the area in terms of the eight slope gradient classes. This feature is exploited in Module V (see Chapter 7) to partition each 30 arc-second grid cells into relevant soil/slope class components, which are each assessed separately for edaphic limitations.

See GAEZ v5 median slope class dataset on FAO Agro-Informatic Data Catalog (link).

Land cover data

Information on current land use/cover, in particular cropland and forest cover, is needed to assess and monitor the sustainability of agriculture at local, regional and global scales. Land cover and land use information today is derived from diverse techniques using remotely sensed data from satellites and from national or sub-national statistics collected through agricultural surveys and census campaigns. The UN-FAO collects and disseminates annual land use data from countries via its land use, irrigation and agricultural practices questionnaire, covering the full land use matrix in line with international definitions first developed by the World Census of Agriculture (FAO, 2024). The FAOSTAT database disseminates these statistical data collected and maintained by FAO. It represents the largest statistical database on food and agriculture in the world. GAEZ v5 makes use of six global high-resolution land cover products (Tubiello et al., 2023) together with FAOSTAT statistics for cropland and forests to estimate a spatial distribution of 11 aggregated land cover classes including:

  1. Built-up land, artificial surfaces
  2. Cropland
  3. Grassland
  4. Tree-covered areas
  5. Shrub-covered areas
  6. Shrub/Herbaceous, regularly flooded
  7. Tree-covered, regularly flooded, saline
  8. Lichen and mosses
  9. Bare or sparsely vegetated land
  10. Permanent snow, Glaciers
  11. Water bodies

Each land cover class is represented as percentage cover in a 30 arc-second grid cell.

The six global land cover products include ESRI (Karra et al., 2021), FROM-GLC (Zhao et al., 2021), GLAD-Map (Potapov et al., 2022), GLC-FCS30 (Zhang et al., 2021), GLOBELAND30 (Chen et al., 2015) and WORLDCOVER (Zanaga et al., 2021). These remotely sensed maps were compiled by the FAO Statistics Division (FAO-ESS), which produced agreement maps for cropland (Tubiello et al., 2023; FAO, 2024) and tree-covered areas.

For the GAEZ v5 consolidated land use database circa 2020s, calibration procedures are applied so that the calibrated raster results for the total extents of cropland and tree-covered areas (forests) correspond to the average of the 2019-2021 statistics reported by FAOSTAT. Appendix 2-1 presents details for the calibration of the cropland layer. A similar approach was followed to derive a calibrated layer of tree-covered areas from the forest classes provided in the high-resolution land cover datasets. As for cropland, the calibration is performed such that the calibrated raster results in total forest extents to match the average of 2019-2021 reported by the UN Food and Agriculture Organization in FAOSTAT. The remaining land cover classes are based on WORLDCOVER but, if necessary, adjusted to match the cropland and forest areas.

The GAEZ v5 land cover layers also include information of area equipped for irrigation derived from the ‘Global Area Equipped for Irrigation Dataset’. It uses the latest sub-national irrigation statistics (covering 17298 administrative units) from various official sources to develop a gridded (5 arc-min resolution) global product of Area Equipped for Irrigation for the years 2000, 2005, 2010, and 2015 (Mehta et al., 2024). The consolidated land use database including areas equipped for irrigation provide key inputs for the downscaling procedures used in Module VI (see Chapter 8) to spatially allocate actual statistical production of the period 2019–21.

See GAEZ v5 dataset on land cover class on FAO Agro-Informatic Data Catalog (link).

The consolidated land use database including areas equipped for irrigation provide key inputs for the downscaling procedures used in Module VI (see Chapter 8) to spatially allocate actual statistical production of the period 2019–21.

GAEZ v5 ‘exclusion’ layer

Land has many important functions. GAEZ outputs emphasize the suitability of land for crop production. Planning for more and better food supplies, produced with fewer resources, causing less environmental impacts and safeguarding biodiversity, will have to continue with high priority in the next decades. Current GAEZ v5 respects land marked by a protection/exclusion status or with recognized biodiversity value. It applies in Module V (see Chapter 7) an ‘exclusion’ layer, which has been compiled from three up-to-date and authoritative international datasets, the World Database of Protected Areas (UNEP-WCMC and IUCN, 2017), the World Database of Key Biodiversity Areas (International, 2022; IUCN, 2016) provided by the Integrated Biodiversity Assessment Tool (IBAT) and a CIFOR/GLWD wetland layer. The CIFOR Tropical Wetland Maps (Gumbricht et al., 2017) cover the tropics and subtropics regions (40° N to 60° S; 180° E to -180° W), excluding small islands. For areas outside the CIFOR domain, GAEZ uses the Global Lakes and Wetlands Database (Lehner and Döll, 2004).

World Database of Protected Areas (WDPA)

The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas. It is a joint project between UN Environment Programme (UNEP) and the International Union for Conservation of Nature (IUCN). The UNEP World Conservation Monitoring Centre (UNEP-WCMC) compiles and manages the WDPA, in collaboration with governments, non-governmental organizations, academia and industry. The WDPA is updated monthly. In October 2010, UNEP-WCMC launched the social media-based website Protected Planet, which allows users to interact with and improve the data that is currently recorded on the WDPA.

The resource database of GAEZ v5 includes data of the February 2023 update of WDPA. For use in GAEZ v5, all polygons were summarized into two classes depending on whether the category field in the database indicated one of the established IUCN categories (class 1) or not (class 2). The polygon data were rasterized at 30 arc-seconds and a narrow buffer of 30 arc-seconds was drawn around each protected area (class 3).

The World Database of Key Biodiversity Areas (KBA)

Key Biodiversity Areas (KBAs) are sites that contribute significantly to the global persistence of biodiversity. Quoting KBA Standards and Appeals Committee (IUCN, 2019): “The criteria used to identify KBAs incorporate elements of biodiversity across genetic, species and ecosystem levels, and are applicable to terrestrial, freshwater, marine and subterranean systems. KBAs have delineated boundaries and are actually or potentially manageable as a unit. KBAs provide an effective bridge between assessment processes and conservation planning and an important step towards conservation action. However, the process of KBA identification and delineation does not include steps to advance management activity and does not imply that any specific conservation action, such as protected area designation, is required.” The 2017 update of the World Database of Key Biodiversity Areas includes more than 15,000 polygons of delineated KBAs. The GAEZ v5 ‘exclusion’ layer includes an inventory of KBA locations outside WDPA protected areas in order to draw attention to recognized high biodiversity values when assessing land for potential agricultural production.

Permanent Wetlands

Wetlands play a fundamental role in climate change mitigation, provide essential ecosystem services, and are of major importance for biodiversity, water quality, flood control. GAEZ v5 exclusion layer includes the CIFOR/GLWD wetland layer as one exclusion class. The CIFOR Tropical Wetland Maps (Gumbricht et al., 2017) cover the tropics and subtropics regions (40° N to 60° S; 180° E to -180° W). For all other areas, the Global Lakes and Wetlands Database (Lehner and Döll, 2004), the main source of wetlands in GAEZ v4, was used.

The CIFOR Tropical Wetland Maps database is part of the Sustainable Wetlands Adaptation and Mitigation Program (SWAMP) led by the Center for International Forestry Research (CIFOR). This database provides detailed information on the distribution of tropical and subtropical wetlands, including peatlands and peat depth, across the globe. It includes various types of wetlands such as mangroves, swamps, fens, riverine and lacustrine floodplains, and marshes.

GAEZ v5 incorporates GLWD Level 3 data (GLWD-3) which comprises lakes, reservoirs, rivers and different wetland types in the form of a global raster map at 30 arc-second resolution. The GLWD-3 dataset has 12 classes as follows: (1) Lake; (2) Reservoir; (3) River; (4) Freshwater Marsh, Floodplain; (5) Swamp Forest, Flooded Forest; (6) Coastal Wetland (incl. Mangrove, Estuary, Delta, Lagoon); (7) Pan, Brackish/Saline Wetland; (8) Bog, Fen, Mire (Peatland); (9) Intermittent Wetland/Lake; (10) 50–100% Wetland; (11) 25–50% Wetland; (12) 0–25% Wetland. GLWD classes 4-9 are included in the GAEZ v5 permanent wetlands layer for all regions not represented in CIFOR/GLWD wetland layer.

Compilation of the GAEZ v5 ‘exclusion’ layer

An ‘exclusion’ layer, to mark land with a protection status or with high biodiversity value, has been compiled from the data sources introduced in the previous sections, namely the World Database of Protected Areas, the World Database of Key Biodiversity Areas and the CIFOR/GLWD Wetlands Database. Both the WDPA and KBA databases were obtained in February 2023. In addition, we mark areas of major tree cover (>80%).

The ‘exclusion’ layer distinguishes six classes at 30 arc-seconds resolution, which were defined in a hierarchical step by step procedure. In a first step, all grid cells with protection status were extracted from WDPA and recorded in two classes (depending on whether an IUCN category was indicated or not). Second, additional grid cells were extracted for locations marked as falling into a KBA polygon. The third step marked grid cells outside protected areas and KBA polygons which were part of the CIFOR/GLWD wetland layer. Finally, areas with more than 80% tree cover from the consolidated land use database were included in the ‘exclusion’ layer, if not already assigned a class value by the previous steps. All remaining land is indicated in the exclusion layer as ‘no exclusion’ class.

The six classes in the exclusion layers are as follows: (1) No exclusion; (2) IUCN category in WDPA; (3) WDPA, not an IUCN category; (4) KBA, outside WDPA protected area; (5) Permanent wetlands, outside protected area and KBA polygon; and (6) Areas of >80% tree cover as shown in the consolidated land cover database, outside WDPA, KBA and permanent wetlands.

See the GAEZ v5 exclusion layer dataset on FAO Agro-Informatic Data Catalog (link).