User_Guide - Rwema25/AE-project GitHub Wiki

User Guide & Documentation: Climate Data Toolkit

1. Introduction & Project Context

1.1 Overview

The “Advancing Climate Data Integration in Agroecological Research” project, funded by the McKnight Foundation and implemented by the Alliance of Bioversity International and CIAT (ABC), in partnership with the African Institute of Mathematical Sciences (AIMS), seeks to improve the capacity of agroecological (AE) researchers and practitioners to integrate climate data into their work. By bridging the gap between raw meteorological data and actionable field insights, the project supports the design of resilient, diversified farming systems.

1.2 The Toolkit Purpose

Rather than creating new datasets, this toolkit serves as a consolidation and simplification layer for existing climate resources, such as the CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) dataset. It provides a streamlined, user-friendly interface to:

  • Automate data retrieval and standardization.
  • Calculate complex climate statistics and hazard indices (e.g., heat stress).
  • Align climate data with specific agronomic windows and crop cycles.

1.3 Target Audience

This documentation is designed for:

  • AE Researchers: Seeking to validate experimental findings with historical climate context.
  • Practitioners & Extension Agents: Designing adaptation strategies for smallholder farmers.
  • Decision Makers: Requiring evidence-based climate risk screenings for regional planning.

1.4 Datasets

To provide a comprehensive view of the agroecological environment, the toolkit integrates high-resolution historical data, future projections, and soil properties. The following tables detail the technical specifications, temporal extent, and sources for each dataset.

Rainfall & Temperature (Historical & Real-time)

These datasets are utilized for baseline historical analysis, monitoring recent trends, and identifying past climate hazards.

Dataset Variables Spatial Res. Temporal Res. Coverage Temporal Extent Reference
CHIRPS v2.0 Precipitation 0.05° (~5.5km) Daily, pentadal (5-day), dekadal (10-day), monthly Quasi-global (50°S-50°N) 1981–Present Climate Hazards Center (CHC), UCSB
CHIRTS v2.0 Maximum and minimum temperature 0.05° (~5.5km) Daily Quasi-global (60°S-70°N) 1983–Present Climate Hazards Center (CHC), UCSB
AgERA5 Standard ERA5 variables plus: Evapotranspiration (actual/potential), Soil water content (various depths), Leaf Area Index (LAI), Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), crop-specific indicators 0.1° (~11km) Daily Global 1979–Present AgERA5
ERA5 Air temperature (various levels), precipitation, surface radiation, wind speed/direction, soil moisture, evaporation, sea surface temperature, sea ice, and more 0.25° (~31km) Hourly, monthly Global 1950–Present Hersbach et al. (2020)

Climate Projections (Future Scenarios)

This dataset supports forward-looking assessments to understand how agroecological systems may shift under different emission pathways. The toolkit utilizes a curated ensemble of 16 Global Climate Models (GCMs) to provide a robust range of future projections.

Dataset Variables Spatial Res. Temporal Res. Coverage Temporal Extent Considered SSPs Included GCM Models (16) Reference
NEX-GDDP-CMIP6 Precipitation, Maximum and minimum temperature 0.25° (~25km) Daily Global Historical: 1950-2014, Future : 2015–2100 SSP1-2.6, SSP2-4.5, SSP5-8.5 ACCESS-CM2, ACCESS-ESM1-5, CanESM5, CMCC-ESM2, EC-Earth3, EC-Earth3-Veg-LR, GFDL-ESM4, INM-CM4-8, INM-CM5-0, KACE-1-0-G, MIROC6, MPI-ESM1-2-LR, MRI-ESM2-0, NorESM2-LM, NorESM2-MM, TaiESM1 NASA

Soil Information

These datasets provide static soil physical and chemical properties required for calculating water holding capacity and nutrient availability.

Dataset Spatial Res. Temporal Res. Coverage Depth Intervals Temporal Extent Reference
SoilGrids 250 m Static Global 0–200 cm 2020 (v2.0) SoilGrids
iSDAsoil 30 m Static Africa 0–50 cm 2021 Release

2. Getting Started: Input Requirements

To begin working within the Visual Studio Code interface: 1. Open the Explorer sidebar if it is not already visible. 2. Click on the CLIMATE-TOOLKIT dropdown menu to reveal the available modules.

alt

3. The expanded CLIMATE-TOOLKIT folder reveals several folders representing the toolkit's modules. 4. To start the data ingestion process for this phase, click on the module folder within this list.

alt

Module-by-Module Workflows

Module 1: fetch_data (Data Ingestion)

Goal: Retrieves and standardizes data.

  • Workflow: User deals with source_data $\rightarrow$ transform_data $\rightarrow$ preprocess_data functions.

5. Expanding the fetch_data folder reveals the specific functions available within the module.

alt

6. To start the primary data fetching logic, click on the source_data function folder within this list. It reveals:

  • list of the available datasets and
  • script file as source_data.py.

alt

7. To execute the primary data sourcing logic, click on the source_data.py script to open the file for editing/configuration

  • 2 Give a two-line command you first need to run in the terminal to activate the environment.
  • 3 Give the command that you need to use to run `source_data.py in the terminal.

alt

  • Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ User inputs variables $\rightarrow$ Toolkit queries dataset API $\rightarrow$ source_data raw format.

To ensure accurate outputs from the source_data function, users must provide inputs in the following formats:

  • Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).

    • Example: -1.29, 36.82 (Nairobi, Kenya).
  • Time period: Start and end years must be provided.

    • Example: 1991-01-01 --to 2020-12-31.
  • Data source: Source of data must be provided.

    • Example: era_5, nasa_power, nex_gddp.

alt

source_data Function Output

The source_data function returns the sourced data in its raw format, preserving original variable names and units.

Output Structure:

  • Data is returned as a DataFrame
  • Variable names match the original source (e.g., ERA5)
  • Units are retained as provided by the source

Example (era_5 Data):

  • Variables: ['date', 'total_precipitation', 'maximum_2m_air_temperature', 'minimum_2m_air_temperature']
  • Units:
    • date: datetime
    • precipitation: m (meter)
    • temperature: K (Kelvin)

alt

8. To start the primary data transform logic, click on the transform_data function folder within the list of functions. It reveals:

  • 2 script file as transform_data.py.

  • 3 Give the command you need to use in the terminal to run transform_data.py.

  • 4 Shows the input command to be edited in the terminal.

  • Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit transforms variables.

To ensure accurate outputs from the transform_data function, users must provide inputs in the following formats:

  • Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).

    • Example: -1.29, 36.82 (Nairobi, Kenya).
  • Time period: Start and end years must be provided.

    • Example: start 1991-01-01 --end 2020-12-31.
  • Data source: Source of data must be provided.

    • Example: era_5, nasa_power, nex_gddp.

alt

transform_data Function Output

The transform_data function returns the sourced data in its raw format, with only transformed variable names.

Output Structure:

  • Data is returned as a DataFrame
  • Variable names change to internal variable names
  • Units are retained as provided by the source

All raw data variable names ingested by the toolkit are transformed into standardized internal variable names to ensure consistency across modules.

Raw Variable (era_5) Internal Variable Name Description
total_precipitation(m) Precipitation Total daily precipitation
maximum_2m_air_temperature(K) max_temperature Maximum 2-meter air temperature
minimum_2m_air_temperature(K) min_temperature Minimum 2-meter air temperature

alt

9. To start the primary data transform logic, click on the preprocess_data function folder within the list of functions. It reveals:

  • 2 script file as preprocess_data.py.

  • 3 Give the command you need to use in the terminal to run transform_data.py.

  • 4 Shows the input command to be edited in the terminal.

  • Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit pre-process data.

To ensure accurate outputs from the preprocess_data function, users must provide inputs in the following formats:

  • Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).

    • Example: -1.29, 36.82 (Nairobi, Kenya).
  • Time period: Start and end years must be provided.

    • Example: start 1991-01-01 --end 2020-12-31.
  • Data source: Source of data must be provided.

    • Example: era_5, nasa_power, nex_gddp.

alt

preprocess_data Function Output

The preprocess_data function returns the converted data values, with transformed variable names and adjusted units.

Output Structure:

  • Data is returned as a DataFrame
  • Variable names change to internal variable names
  • Units are changed to standard units

All raw data ingested by the toolkit is pre-processed and transformed into a standardized internal format to ensure consistency across modules.

Raw Variable (era_5) Internal Variable Name Unit Description
total_precipitation(m) Precipitation mm Total daily precipitation
maximum_2m_air_temperature(K) max_temperature °C Maximum 2-meter air temperature
minimum_2m_air_temperature(K) min_temperature °C Minimum 2-meter air temperature

alt

To get a better understanding of the module and associated functions, you can view the following code:

python code:

Module 2: climatology (Location Profiling)

Goal: Establish a Climatological Standard Normal for a specific location.

  • Workflow: Period Filtering (continuous 30-year: January 1, 1991, to December 31, 2020) $\rightarrow$ Temporal Averaging (Calculates the Annual Mean Temperature and Annual Total Precipitation for each of the 30 years) $\rightarrow$ Normal Calculation (Computes the multi-year mean of those 30 annual values).

10. Expanding the climatology folder reveals the script file as long_term_climatology.py. 11. To execute the long-term climatology logic, click on the long_term_climatology.py to open the file for editing

  • 2 script file as long_term_climatology.py.

  • 3 Give the command you need to use in the terminal to run long_term_climatology.py.

  • 4 Shows the input command to be edited in the terminal.

  • Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit calculate Climate Normal.

To ensure accurate outputs from the transform_data function, users must provide inputs in the following formats:

  • Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).

    • Example: -1.29, 36.82 (Nairobi, Kenya).
  • Time period: Start and end years must be provided.

    • Example: start-year 1991 --end-year 2020.
  • Data source: Source of data must be provided.

    • Example: era_5, nasa_power, nex_gddp.

alt

climatology Module Output The climatology module returns the calculated Climatological Standard Normals, providing a structured summary report of location-specific average precipitation and temperature metrics.

Output Structure:

  • Data is returned as a structured data object
  • Variable names are standardized to internal toolkit names
  • Units are standardized to SI-compliant units:
    • Precipitation is in millimeters (mm) and millimeters per day (mm/day).
    • Temperature is in degrees Celsius (°C).

alt

To get a better understanding of the module and associated functions, you can view the following code:

python code:

Module 3: compare_datasets (Data Interoperability & Validation)

Goal: Quantify the consistency, accuracy, and biases between different climate datasets.

  • Workflow: Identifies discrepancies and similarities between input datasets using statistical methods.

The screenshot below help for raw data easy access

To get a better understanding of the module and associated functions, you can view the following code:

Module 4: season_analysis (Growth Windows)

Goal:

  • Workflow:

Module 5: climate_statistics (Baseline Integrity)

Goal:

  • Workflow:

Module 6: calculate_hazards (Extreme Events)

Goal:

  • Workflow:

Module 7: compare_period ()

Goal:

  • Workflow: