User_Guide - Rwema25/AE-project GitHub Wiki

User Guide & Documentation: Climate Data Toolkit

1. Introduction & Project Context

1.1 Overview

The “Advancing Climate Data Integration in Agroecological Research” project, funded by the McKnight Foundation and implemented by the Alliance of Bioversity International and CIAT (ABC), in partnership with the African Institute of Mathematical Sciences (AIMS), seeks to improve the capacity of agroecological (AE) researchers and practitioners to integrate climate data into their work. By bridging the gap between raw meteorological data and actionable field insights, the project supports the design of resilient, diversified farming systems.

1.2 The Toolkit Purpose

Rather than creating new datasets, this toolkit serves as a consolidation and simplification layer for existing climate resources, such as the CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) dataset. It provides a streamlined, user-friendly interface to:

Automate data retrieval and standardization.
Calculate complex climate statistics and hazard indices (e.g., heat stress).
Align climate data with specific agronomic windows and crop cycles.

1.3 Target Audience

This documentation is designed for:

AE Researchers: Seeking to validate experimental findings with historical climate context.
Practitioners & Extension Agents: Designing adaptation strategies for smallholder farmers.
Decision Makers: Requiring evidence-based climate risk screenings for regional planning.

1.4 Datasets

To provide a comprehensive view of the agroecological environment, the toolkit integrates high-resolution historical data, future projections, and soil properties. The following tables detail the technical specifications, temporal extent, and sources for each dataset.

Rainfall & Temperature (Historical & Real-time)

These datasets are utilized for baseline historical analysis, monitoring recent trends, and identifying past climate hazards.

Dataset	Variables	Spatial Res.	Temporal Res.	Coverage	Temporal Extent	Reference
CHIRPS v2.0	Precipitation	0.05° (~5.5km)	Daily, pentadal (5-day), dekadal (10-day), monthly	Quasi-global (50°S-50°N)	1981–Present	Climate Hazards Center (CHC), UCSB
CHIRTS v2.0	Maximum and minimum temperature	0.05° (~5.5km)	Daily	Quasi-global (60°S-70°N)	1983–Present	Climate Hazards Center (CHC), UCSB
AgERA5	Standard ERA5 variables plus: Evapotranspiration (actual/potential), Soil water content (various depths), Leaf Area Index (LAI), Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), crop-specific indicators	0.1° (~11km)	Daily	Global	1979–Present	AgERA5
ERA5	Air temperature (various levels), precipitation, surface radiation, wind speed/direction, soil moisture, evaporation, sea surface temperature, sea ice, and more	0.25° (~31km)	Hourly, monthly	Global	1950–Present	Hersbach et al. (2020)

Climate Projections (Future Scenarios)

This dataset supports forward-looking assessments to understand how agroecological systems may shift under different emission pathways. The toolkit utilizes a curated ensemble of 16 Global Climate Models (GCMs) to provide a robust range of future projections.

Dataset	Variables	Spatial Res.	Temporal Res.	Coverage	Temporal Extent	Considered SSPs	Included GCM Models (16)	Reference
NEX-GDDP-CMIP6	Precipitation, Maximum and minimum temperature	0.25° (~25km)	Daily	Global	Historical: 1950-2014, Future : 2015–2100	SSP1-2.6, SSP2-4.5, SSP5-8.5	ACCESS-CM2, ACCESS-ESM1-5, CanESM5, CMCC-ESM2, EC-Earth3, EC-Earth3-Veg-LR, GFDL-ESM4, INM-CM4-8, INM-CM5-0, KACE-1-0-G, MIROC6, MPI-ESM1-2-LR, MRI-ESM2-0, NorESM2-LM, NorESM2-MM, TaiESM1	NASA

Soil Information

These datasets provide static soil physical and chemical properties required for calculating water holding capacity and nutrient availability.

Dataset	Spatial Res.	Temporal Res.	Coverage	Depth Intervals	Temporal Extent	Reference
SoilGrids	250 m	Static	Global	0–200 cm	2020 (v2.0)	SoilGrids
iSDAsoil	30 m	Static	Africa	0–50 cm	2021 Release

2. Getting Started: Input Requirements

To begin working within the Visual Studio Code interface: 1. Open the Explorer sidebar if it is not already visible. 2. Click on the CLIMATE-TOOLKIT dropdown menu to reveal the available modules.

alt

3. The expanded CLIMATE-TOOLKIT folder reveals several folders representing the toolkit's modules. 4. To start the data ingestion process for this phase, click on the module folder within this list.

alt

Module-by-Module Workflows

Module 1: `fetch_data` (Data Ingestion)

Goal: Retrieves and standardizes data.

Workflow: User deals with source_data $\rightarrow$ transform_data $\rightarrow$ preprocess_data functions.

5. Expanding the fetch_data folder reveals the specific functions available within the module.

alt

6. To start the primary data fetching logic, click on the source_data function folder within this list. It reveals:

list of the available datasets and
script file as source_data.py.

alt

7. To execute the primary data sourcing logic, click on the source_data.py script to open the file for editing/configuration

2 Give a two-line command you first need to run in the terminal to activate the environment.
3 Give the command that you need to use to run `source_data.py in the terminal.

alt

Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ User inputs variables $\rightarrow$ Toolkit queries dataset API $\rightarrow$ source_data raw format.

To ensure accurate outputs from the source_data function, users must provide inputs in the following formats:

Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).
- Example: -1.29, 36.82 (Nairobi, Kenya).
Time period: Start and end years must be provided.
- Example: 1991-01-01 --to 2020-12-31.
Data source: Source of data must be provided.
- Example: era_5, nasa_power, nex_gddp.

alt

source_data Function Output

The source_data function returns the sourced data in its raw format, preserving original variable names and units.

Output Structure:

Data is returned as a DataFrame
Variable names match the original source (e.g., ERA5)
Units are retained as provided by the source

Example (era_5 Data):

Variables: ['date', 'total_precipitation', 'maximum_2m_air_temperature', 'minimum_2m_air_temperature']
Units:
- date: datetime
- precipitation: m (meter)
- temperature: K (Kelvin)

alt

8. To start the primary data transform logic, click on the transform_data function folder within the list of functions. It reveals:

2 script file as transform_data.py.
3 Give the command you need to use in the terminal to run transform_data.py.
4 Shows the input command to be edited in the terminal.
Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit transforms variables.

To ensure accurate outputs from the transform_data function, users must provide inputs in the following formats:

Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).
- Example: -1.29, 36.82 (Nairobi, Kenya).
Time period: Start and end years must be provided.
- Example: start 1991-01-01 --end 2020-12-31.
Data source: Source of data must be provided.
- Example: era_5, nasa_power, nex_gddp.

alt

transform_data Function Output

The transform_data function returns the sourced data in its raw format, with only transformed variable names.

Output Structure:

Data is returned as a DataFrame
Variable names change to internal variable names
Units are retained as provided by the source

All raw data variable names ingested by the toolkit are transformed into standardized internal variable names to ensure consistency across modules.

Raw Variable (`era_5`)	Internal Variable Name	Description
`total_precipitation(m)`	`Precipitation`	Total daily precipitation
`maximum_2m_air_temperature(K)`	`max_temperature`	Maximum 2-meter air temperature
`minimum_2m_air_temperature(K)`	`min_temperature`	Minimum 2-meter air temperature

alt

9. To start the primary data transform logic, click on the preprocess_data function folder within the list of functions. It reveals:

2 script file as preprocess_data.py.
3 Give the command you need to use in the terminal to run transform_data.py.
4 Shows the input command to be edited in the terminal.
Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit pre-process data.

To ensure accurate outputs from the preprocess_data function, users must provide inputs in the following formats:

Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).
- Example: -1.29, 36.82 (Nairobi, Kenya).
Time period: Start and end years must be provided.
- Example: start 1991-01-01 --end 2020-12-31.
Data source: Source of data must be provided.
- Example: era_5, nasa_power, nex_gddp.

alt

preprocess_data Function Output

The preprocess_data function returns the converted data values, with transformed variable names and adjusted units.

Output Structure:

Data is returned as a DataFrame
Variable names change to internal variable names
Units are changed to standard units

All raw data ingested by the toolkit is pre-processed and transformed into a standardized internal format to ensure consistency across modules.

Raw Variable (`era_5`)	Internal Variable Name	Unit	Description
`total_precipitation(m)`	`Precipitation`	mm	Total daily precipitation
`maximum_2m_air_temperature(K)`	`max_temperature`	°C	Maximum 2-meter air temperature
`minimum_2m_air_temperature(K)`	`min_temperature`	°C	Minimum 2-meter air temperature

alt

To get a better understanding of the module and associated functions, you can view the following code:

python code:

Module 2: `climatology` (Location Profiling)

Goal: Establish a Climatological Standard Normal for a specific location.

Workflow: Period Filtering (continuous 30-year: January 1, 1991, to December 31, 2020) $\rightarrow$ Temporal Averaging (Calculates the Annual Mean Temperature and Annual Total Precipitation for each of the 30 years) $\rightarrow$ Normal Calculation (Computes the multi-year mean of those 30 annual values).

10. Expanding the climatology folder reveals the script file as long_term_climatology.py. 11. To execute the long-term climatology logic, click on the long_term_climatology.py to open the file for editing

2 script file as long_term_climatology.py.
3 Give the command you need to use in the terminal to run long_term_climatology.py.
4 Shows the input command to be edited in the terminal.
Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit calculate Climate Normal.

To ensure accurate outputs from the transform_data function, users must provide inputs in the following formats:

Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).
- Example: -1.29, 36.82 (Nairobi, Kenya).
Time period: Start and end years must be provided.
- Example: start-year 1991 --end-year 2020.
Data source: Source of data must be provided.
- Example: era_5, nasa_power, nex_gddp.

alt

climatology Module Output The climatology module returns the calculated Climatological Standard Normals, providing a structured summary report of location-specific average precipitation and temperature metrics.

Output Structure:

Data is returned as a structured data object
Variable names are standardized to internal toolkit names
Units are standardized to SI-compliant units:
- Precipitation is in millimeters (mm) and millimeters per day (mm/day).
- Temperature is in degrees Celsius (°C).

alt

To get a better understanding of the module and associated functions, you can view the following code:

python code:

Module 3: `compare_datasets` (Data Interoperability & Validation)

Goal: Quantify the consistency, accuracy, and biases between different climate datasets.

Workflow: Identifies discrepancies and similarities between input datasets using statistical methods.

The screenshot below help for raw data easy access

To get a better understanding of the module and associated functions, you can view the following code:

Module 4: `season_analysis` (Growth Windows)

Goal:

Workflow:

Module 5: `climate_statistics` (Baseline Integrity)

Goal:

Workflow:

Module 6: `calculate_hazards` (Extreme Events)

Goal:

Workflow:

Module 7: `compare_period` ()

Goal:

Workflow:

User_Guide - Rwema25/AE-project GitHub Wiki

User Guide & Documentation: Climate Data Toolkit

1. Introduction & Project Context

1.1 Overview

1.2 The Toolkit Purpose

1.3 Target Audience

1.4 Datasets

Rainfall & Temperature (Historical & Real-time)

Climate Projections (Future Scenarios)

Soil Information

2. Getting Started: Input Requirements

Module-by-Module Workflows

Module 1: fetch_data (Data Ingestion)

Module 2: climatology (Location Profiling)

Module 3: compare_datasets (Data Interoperability & Validation)

Module 4: season_analysis (Growth Windows)

Module 5: climate_statistics (Baseline Integrity)

Module 6: calculate_hazards (Extreme Events)

Module 7: compare_period ()

Module 1: `fetch_data` (Data Ingestion)

Module 2: `climatology` (Location Profiling)

Module 3: `compare_datasets` (Data Interoperability & Validation)

Module 4: `season_analysis` (Growth Windows)

Module 5: `climate_statistics` (Baseline Integrity)

Module 6: `calculate_hazards` (Extreme Events)

Module 7: `compare_period` ()