08. Module VI (Actual yield and production) - un-fao/gaezv5 GitHub Wiki
Introduction
Global change processes raise new estimation problems challenging the conventional statistical methods. These methods are based on the ability to obtain observations from unknown true probability distributions, whereas the new problems require recovering information from only partially observable or even unobservable variables. For instance, aggregate data exist at global and national level regarding agricultural production. ‘Downscaling’ methods in this case should achieve plausible estimation of spatial distributions, consistent with ‘local’ data obtained from remote sensing, available aggregate agricultural statistics, and other available evidence. For this purpose, a flexible sequential downscaling method, based on iterative rebalancing, was developed at IIASA and implemented for use in GAEZ. The information flow associated with the spatial allocation of agricultural statistics is sketched in the figure below.
Figure 8-1 Information flow in Module VI
Downscaling of agricultural statistics to grid cells
Agricultural production and land statistics are available at national scale from FAO, but these statistical data do not reflect the spatial heterogeneity of agricultural production systems at finer resolutions, e.g., grid cells, within country boundaries. In this case a “downscaling” method is needed for attribution of aggregate national production statistics to individual spatial units (grid cells) by applying formal methods that account for land characteristics, assess possible production options and can use available evidence from observed or inferred geo-spatial information, including remotely sensed land cover, soil, climate and vegetation distribution, population density and distribution, etc.
Land cover data products are classifications that provide detailed geographical information, amongst others of the distribution of cropland. Besides land cover/use data there exists other important information on factors, which significantly affect the patterns and intensities of crop production. For example, spatially explicit biophysical data related to land constraints, such as soil type and terrain slopes, and land productivity for specific agricultural activities, human population distribution, prices received by farmers, etc. Such data, in combination with GAEZ crop suitability and potential attainable yield layers, was used in the downscaling procedures to construct a prior distribution for allocation of agricultural cropping activities and production.
To achieve consistency of available data and estimates across scales, the sequential rebalancing procedures that were developed at IIASA rely on appropriate optimization principles (Fischer et al., 2006a) and combine the available statistics with the calculated “prior” and other hard (accounting identities) and soft (expert opinion) constraint data.
To guide the spatial allocation of crops, GAEZ procedures for the calculation of potential yields and production have been applied to, respectively, rain-fed and irrigated cropland shares of individual 5 arc-minute grid cells. Rather than taking an average yield for the entire grid cell it is assumed that the cultivated land will occupy the better part of the suitability distribution determined in each grid cell. To estimate consistent spatial yield patterns of currently cultivated crops by grid cells requires joint downscaling of agricultural statistics for all crops simultaneously. The sequential downscaling consists of efficient iterative rebalancing procedures (Fischer et al., 2006b) based on cross entropy maximization principles, thereby allocating production in crop statistics to appropriate tracts of rain-fed respectively irrigated cropland while providing realistic estimates of current yield and production for the cropland in individual grid cells, consistent with the land’s spatial distribution and agronomic capabilities.
In summary, two main steps were involved in obtaining downscaled grid-cell level area, yield and production of main crops:
- Compilation of calibrated shares of rain-fed and irrigated cropland by 30 arc-seconds (and aggregation to 5 arc-minute) grid cell, and
- Attribution of crop specific harvested area yield and production to the rain-fed and irrigated cropland of each grid cell.
Calibration of rain-fed and irrigated cropland shares
For the estimation of cropland shares in individual 5 arc-minute grid cells, data from the GAEZ v5 consolidated land use database were used. In step (i) the available land cover interpretations are combined to produce a quantification of each grid cell in the spatial raster in terms of twelve main land use/land cover shares. These shares are for:
- Built-up land, artificial surfaces
- Cropland
- Grassland
- Tree-covered areas
- Shrub-covered areas
- Shrub/Herbaceous, regularly flooded
- Tree-covered, regularly flooded, saline
- Lichen and mosses
- Bare or sparsely vegetated land
- Permanent snow, glaciers
- Water bodies
- Cropland equipped with full control irrigation
The estimation of cropland shares by 30 arc-second grid cell used in GAEZ employs an approach to formally and consistently integrate up-to-date geographical data sets obtained from remote sensing with statistical information compiled by FAO and/or national statistical bureaus, as a basis for spatially detailed downscaling of agricultural production statistics to land units (grid cells) and subsequent yield gap analysis. This information is needed to prevent double counting of available resources and is essential for various environmental assessments requiring spatial detail.
An iterative calculation procedure was used to estimate land cover class weights, consistent with aggregate FAO land statistics and spatial land cover patterns obtained from remotely sensed data. The procedure involves a sequence of steps, as follows:
Collection of national (and possibly sub-national) statistics on cropland;
- Integration of land cover data sets;
- Spatial aggregation of geographical land cover data to obtain distributions of land cover classes at the level of national and sub-national administrative units for which statistical data is available;
- Cross-sectional regressions of statistical cropland against land cover distributions derived from geographical land cover data sets to obtain reference weights for each land cover class in terms of cultivated land contained;
- Estimation of urban/built-up land shares based on an empirical relationship of per capita land requirements as a function of population density, by application to a spatially detailed population density dataset at 30 arc-seconds and aggregation of results to 5 arc-minute grid cells;
- Application of an iterative procedure for the adjustment of land cover class weights, starting from estimated reference values, to achieve consistency of geographical and statistical data, i.e., such that weighted summation of land cover classes of an allocation unit (country or sub-national administrative unit) results in the total cropland as reported in the statistical data. This procedure is first run to calibrate irrigated cropland with AQUASTAT statistics and is then applied to cropland reported in FAOSTAT (keeping calibrated irrigated land fixed), and
- Adjustment of remaining land cover shares (i.e., excluding cropland, urban/built-up land and water bodies) to ensure consistency such that all land cover shares sum up to 100 % in each grid cell.
Land cover class weights define for each land cover class and spatial allocation unit (e.g., country) the contents of a land cover class in terms of cropland. Starting values of class weights for the cropland class used in the iterative procedure were obtained by cross-country regression of statistical data of cropland against aggregated extents of national land cover class distributions obtained from GIS.
The iterative algorithm for adjusting land cover weights is controlled by a parameter file specifying three levels of increasingly wider intervals within which the respective class weights can be adjusted. The ranges of permissible class weights for each land cover category were defined by (i) where possible, quantitative information contained in the GLC-Share legend class description, and (ii) expert judgment on the plausibility and possible magnitude of the presence of cultivated land in different land cover classes.
For instance, the weight used for cultivated land contained in the cropland class (02) would in a first step be adjusted in the interval [0.65, 0.85] and cultivated land content of all other classes is kept at 0. If this adjustment is insufficient then the interval [0.50, 0.95] is tested and small amounts of cropland can also be considered in grassland and shrubland areas. In the final step, the class weight for cropland are chosen from the interval [0.25, 1.00] and the permissible amount in some other classes will be increased. Note, in most countries this last step was not necessary and a solution was found in the first or second iteration. In this way the algorithm not only produces formally consistent results for each allocation unit but also provides an indication of the discrepancy between mapped land cover distributions and statistical amounts of cropland.
See the GAEZ v5 dataset on Share of land cover class on the FAO Agro-Informatic Data Catalog (link).
Attribution of crop production statistics to current cropland
Agricultural crop production data are available at national scale from FAOSTAT. Sub-national information was collected and compiled from national institutions and from CROPGRIDS (Tang et al., 2023). The spatial occurrence of rain-fed and irrigated cropland compatible with aggregate statistical data was established in the previous step. The main objective of the second step is to allocate crop production statistics to the spatial cropland units while meeting statistical accounts and respecting crop suitability and land capabilities reflected in the spatial land resources inventory.
The algorithm can be summarized as follows: the potential suitability of individual crops in the cropland of each grid cell is available from geographically detailed GAEZ assessments undertaken in Module I to Module V for the different input levels and water sources (i.e., rain-fed and irrigated) including estimates of agronomically attainable crop yields. The crop production statistics and the spatial information available for each country were used to calculate an initial estimate of crop-wise area allocation and production, a so-called “prior”. The priors are subsequently revised in an iterative procedure. Each iteration step determines the discrepancy between statistical totals available at the level of spatial units (countries or sub-national units) and the respective totals calculated by summing harvested areas and production over grid cells. The magnitude of these deviations is used to revise the land and crop allocation and to recalculate discrepancies. The process is continued until all accounting constraints are met (Fischer et al., 2006) and the crop distribution and production is consistent with aggregate statistical data of crop harvested area and production, is allocated to the available rain-fed and irrigated cropland, including its capacity to support multi-cropping under respectively rain-fed and irrigated conditions, and is in agreement with ancillary sub-national data, in particular selected crop area distribution data and agro-ecological suitability of crops as estimated in GAEZ v5. A mathematical description of the iterative rebalancing method used for downscaling is given in Appendix 8-1 Downscaling of area, production and yield of crops.
For the estimation of rainfed and irrigated crops in individual 5 arc-minute grid cells, three main data sources provided prior information: (i) SPAM 2020, Spatial Production Allocation Model (Zhe Guo et al., 2024); (ii) MIRCA2000, Monthly Irrigated and Rainfed Crop Areas (Portmann et al., 2011); (iii) CROPGRIDS (Tang et al., 2023). These gridded datasets were scrutinized for their suitability in the downscaling procedures in terms of crops included, the feasibility of spatial patterns, and consistency with national statistics. Each crop prior layer was used to provide information on the spatial distribution of crops within an administrative unit.
Description of Module VI outputs
The downscaling procedures and implementation using the year 2019–2021 agricultural statistics have resulted in the following data sets:
- A global inventory of shares of rain-fed and irrigated cropland at 30 arc-seconds. The inventory is consistent at national level with FAO land use statistics (arable land, land under permanent crops, land equipped for full control irrigation) of 2019–2021;
- Mapped distribution of harvested area, yield and production at 5 arc-minutes resolution for all major crops in rain-fed cropland, based on year 2019–2021 FAO statistics;
- Mapped distribution of harvested area, yield and production at 5 arc-minutes resolution for all major crops in irrigated land, based on year 2019–2021 FAO statistics, and
- Estimates of the spatial distribution of total crop production value and the production values of major crop groups (cereals, root crops, oil crops).
The results of spatial attribution of crop statistics for the year 2019–2021 undertaken in Module VI are stored as GIS rasters of 5 arc-minute grid cells, separately by 33 crops/crop groups, by total cropland, rain-fed and irrigated cropland. The raster data were produced for harvested area, production and implied average crop yield (i.e., yield = production/harvested area).
Note, the downscaled production from FAOSTAT statistics covers all recorded crop production activities and the attribution to statistical physical cropland (arable land and land under permanent crops) rain-fed and irrigated land units of the resource inventory captures the entire resource use intensity (multiple cropping and/or fallowing) of crop production and avoids incomplete or double counting of available resources, which may occur if only selected commodities were to be downscaled.
The table below shows the 33 commodities downscaled from statistical data to spatial rasters at 5 arc-minute resolution in GAEZ v5 and lists the relationship of each downscaled commodity with regard to the items recorded in the FAOSTAT database (http://www.fao.org/faostat/en/#data/QC) from where average 2019–2021 statistical values were extracted from the primary crop production domain as input data for downscaling.
Table: Crop Groups with Crops and Average Prices (I$2015/ton)
Crop Group Name | Crops | Avg. Price (I$2015/ton) |
---|---|---|
Wheat | Wheat | 236.83 |
Rice | Rice | 391.07 |
Maize | Maize (corn) | 200.73 |
Sorghum | Sorghum | 217.23 |
Millet | Millet | 300.10 |
Barley | Barley | 194.83 |
Other Cereals | Rye, Oats, Triticale, Buckwheat, Fonio, Quinoa, Canary seed, Mixed grain, Cereals n.e.c. | 491.08 |
Potatoes & Sweet Potatoes | Potatoes, Sweet potatoes | 228.48 |
Cassava | Cassava, fresh | 145.28 |
Other Roots and Tubers | Yams, Taro, Yautia, Edible roots and tubers n.e.c., Chicory roots | 443.15 |
Sugar Beet | Sugar beet | 47.11 |
Sugar Cane | Sugar cane | 44.88 |
Pulses | Beans (dry), Broad beans (dry), Chick peas, Lentils, Peas (dry), Cow peas, Pigeon peas, Bambara beans, Other pulses n.e.c. | 589.10 |
Soya Beans | Soya beans | 382.46 |
Rape or Colza Seed | Rape or colza seed | 470.18 |
Sunflower Seed | Sunflower seed | 488.09 |
Groundnuts | Groundnuts, excluding shelled | 723.77 |
Sesame Seed | Sesame seed | 1132.69 |
Oil Palm Fruit | Oil palm fruit | 97.94 |
Coconuts | Coconuts, in shell | 163.21 |
Other Oil Seeds | Linseed, Mustard seed, Safflower seed, Castor oil seeds, Poppy seed, Melonseed, Hempseed, Other oil seeds n.e.c., Olives, Karite nuts, Tung nuts, Jojoba seeds, Tallowtree seeds | 668.23 |
Seed Cotton | Seed cotton, unginned | 900.90 |
Tobacco | Unmanufactured tobacco | 2239.66 |
Bananas | Bananas, Plantains and cooking bananas | 366.06 |
Coffee | Coffee, green | 2089.67 |
Cocoa | Cocoa beans | 1471.52 |
Tea and Maté | Tea leaves, Maté leaves | 1127.01 |
Tomatoes | Tomatoes | 476.12 |
Other Vegetables | Asparagus, Cabbages, Cauliflowers and broccoli, Lettuce and chicory, Spinach, Artichokes, Chillies and peppers (green), Cucumbers and gherkins, Eggplants, Pumpkins/squash/gourds, Okra, String beans, Other beans (green), Peas (green), Broad beans and horse beans (green), Carrots and turnips, Green garlic, Green onions and shallots, Dry onions and shallots, Leeks, Green corn (maize), Other vegetables n.e.c., Locust beans (carobs) | 564.11 |
Other Pulses | Vetches, Lupins | 333.68 |
Fruits & Nuts (General) | Watermelons, Melons, Avocados, Dates, Figs, Mangoes, Papayas, Pineapples, Citrus (oranges, lemons, mandarins, etc.), Grapes, Apples, Pears, Stone fruits (plums, cherries, peaches, etc.), Berries (strawberries, blueberries, raspberries, etc.), Nuts (almonds, pistachios, walnuts, etc.), Other fruits n.e.c. | 949.22 |
Natural Rubber | Natural rubber in primary forms | 1256.88 |
Other Industrial/Stimulant Crops | Kapok fruit, Jute, Kenaf, Flax, True hemp, Ramie, Sisal, Agave fibres, Abaca, Fibre crops n.e.c., Spices (pepper, ginger, vanilla, cinnamon, etc.), Peppermint, Pyrethrum, Other spice/aromatic crops n.e.c. | 1250.72 |