User_Guide - Rwema25/AE-project GitHub Wiki
User Guide & Documentation: Climate Data Toolkit
1. Introduction & Project Context
1.1 Overview
The “Advancing Climate Data Integration in Agroecological Research” project, funded by the McKnight Foundation and implemented by the Alliance of Bioversity International and CIAT (ABC), in partnership with the African Institute of Mathematical Sciences (AIMS), seeks to improve the capacity of agroecological (AE) researchers and practitioners to integrate climate data into their work. By bridging the gap between raw meteorological data and actionable field insights, the project supports the design of resilient, diversified farming systems.
1.2 The Toolkit Purpose
Rather than creating new datasets, this toolkit serves as a consolidation and simplification layer for existing climate resources, such as the CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) dataset. It provides a streamlined, user-friendly interface to:
- Automate data retrieval and standardization.
- Calculate complex climate statistics and hazard indices (e.g., heat stress).
- Align climate data with specific agronomic windows and crop cycles.
1.3 Target Audience
This documentation is designed for:
- AE Researchers: Seeking to validate experimental findings with historical climate context.
- Practitioners & Extension Agents: Designing adaptation strategies for smallholder farmers.
- Decision Makers: Requiring evidence-based climate risk screenings for regional planning.
1.4 Datasets
To provide a comprehensive view of the agroecological environment, the toolkit integrates high-resolution historical data, future projections, and soil properties. The following tables detail the technical specifications, temporal extent, and sources for each dataset.
Rainfall & Temperature (Historical & Real-time)
These datasets are utilized for baseline historical analysis, monitoring recent trends, and identifying past climate hazards.
| Dataset | Variables | Spatial Res. | Temporal Res. | Coverage | Temporal Extent | Reference |
|---|---|---|---|---|---|---|
| CHIRPS v2.0 | Precipitation | 0.05° (~5.5km) | Daily, pentadal (5-day), dekadal (10-day), monthly | Quasi-global (50°S-50°N) | 1981–Present | Climate Hazards Center (CHC), UCSB |
| CHIRTS v2.0 | Maximum and minimum temperature | 0.05° (~5.5km) | Daily | Quasi-global (60°S-70°N) | 1983–Present | Climate Hazards Center (CHC), UCSB |
| AgERA5 | Standard ERA5 variables plus: Evapotranspiration (actual/potential), Soil water content (various depths), Leaf Area Index (LAI), Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), crop-specific indicators | 0.1° (~11km) | Daily | Global | 1979–Present | AgERA5 |
| ERA5 | Air temperature (various levels), precipitation, surface radiation, wind speed/direction, soil moisture, evaporation, sea surface temperature, sea ice, and more | 0.25° (~31km) | Hourly, monthly | Global | 1950–Present | Hersbach et al. (2020) |
Climate Projections (Future Scenarios)
This dataset supports forward-looking assessments to understand how agroecological systems may shift under different emission pathways. The toolkit utilizes a curated ensemble of 16 Global Climate Models (GCMs) to provide a robust range of future projections.
| Dataset | Variables | Spatial Res. | Temporal Res. | Coverage | Temporal Extent | Considered SSPs | Included GCM Models (16) | Reference |
|---|---|---|---|---|---|---|---|---|
| NEX-GDDP-CMIP6 | Precipitation, Maximum and minimum temperature | 0.25° (~25km) | Daily | Global | Historical: 1950-2014, Future : 2015–2100 | SSP1-2.6, SSP2-4.5, SSP5-8.5 | ACCESS-CM2, ACCESS-ESM1-5, CanESM5, CMCC-ESM2, EC-Earth3, EC-Earth3-Veg-LR, GFDL-ESM4, INM-CM4-8, INM-CM5-0, KACE-1-0-G, MIROC6, MPI-ESM1-2-LR, MRI-ESM2-0, NorESM2-LM, NorESM2-MM, TaiESM1 | NASA |
Soil Information
These datasets provide static soil physical and chemical properties required for calculating water holding capacity and nutrient availability.
| Dataset | Spatial Res. | Temporal Res. | Coverage | Depth Intervals | Temporal Extent | Reference |
|---|---|---|---|---|---|---|
| SoilGrids | 250 m | Static | Global | 0–200 cm | 2020 (v2.0) | SoilGrids |
| iSDAsoil | 30 m | Static | Africa | 0–50 cm | 2021 Release |
2. Getting Started: Input Requirements
To begin working within the Visual Studio Code interface:
1. Open the Explorer sidebar if it is not already visible.
2. Click on the CLIMATE-TOOLKIT dropdown menu to reveal the available modules.
.png)
3. The expanded CLIMATE-TOOLKIT folder reveals several folders representing the toolkit's modules.
4. To start the data ingestion process for this phase, click on the module folder within this list.
.png)
Module-by-Module Workflows
Module 1: fetch_data (Data Ingestion)
Goal: Retrieves and standardizes data.
- Workflow: User deals with
source_data$\rightarrow$transform_data$\rightarrow$preprocess_datafunctions.
5. Expanding the fetch_data folder reveals the specific functions available within the module.
.png)
6. To start the primary data fetching logic, click on the source_data function folder within this list.
It reveals:
- list of the available datasets and
- script file as
source_data.py.
.png)
7. To execute the primary data sourcing logic, click on the source_data.py script to open the file for editing/configuration
- 2 Give a two-line command you first need to run in the terminal to activate the environment.
- 3 Give the command that you need to use to run `source_data.py in the terminal.
.png)
- Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ User inputs variables $\rightarrow$ Toolkit queries dataset API $\rightarrow$
source_dataraw format.
To ensure accurate outputs from the source_data function, users must provide inputs in the following formats:
-
Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).
- Example:
-1.29, 36.82(Nairobi, Kenya).
- Example:
-
Time period: Start and end years must be provided.
- Example:
1991-01-01 --to 2020-12-31.
- Example:
-
Data source: Source of data must be provided.
- Example:
era_5,nasa_power,nex_gddp.
- Example:
.png)
source_data Function Output
The source_data function returns the sourced data in its raw format, preserving original variable names and units.
Output Structure:
- Data is returned as a DataFrame
- Variable names match the original source (e.g., ERA5)
- Units are retained as provided by the source
Example (era_5 Data):
- Variables: ['date', 'total_precipitation', 'maximum_2m_air_temperature', 'minimum_2m_air_temperature']
- Units:
- date: datetime
- precipitation: m (meter)
- temperature: K (Kelvin)
.png)
8. To start the primary data transform logic, click on the transform_data function folder within the list of functions.
It reveals:
-
2 script file as
transform_data.py. -
3 Give the command you need to use in the terminal to run
transform_data.py. -
4 Shows the input command to be edited in the terminal.
-
Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit transforms variables.
To ensure accurate outputs from the transform_data function, users must provide inputs in the following formats:
-
Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).
- Example:
-1.29, 36.82(Nairobi, Kenya).
- Example:
-
Time period: Start and end years must be provided.
- Example:
start 1991-01-01 --end 2020-12-31.
- Example:
-
Data source: Source of data must be provided.
- Example:
era_5,nasa_power,nex_gddp.
- Example:
.png)
transform_data Function Output
The transform_data function returns the sourced data in its raw format, with only transformed variable names.
Output Structure:
- Data is returned as a DataFrame
- Variable names change to internal variable names
- Units are retained as provided by the source
All raw data variable names ingested by the toolkit are transformed into standardized internal variable names to ensure consistency across modules.
Raw Variable (era_5) |
Internal Variable Name | Description |
|---|---|---|
total_precipitation(m) |
Precipitation |
Total daily precipitation |
maximum_2m_air_temperature(K) |
max_temperature |
Maximum 2-meter air temperature |
minimum_2m_air_temperature(K) |
min_temperature |
Minimum 2-meter air temperature |
.png)
9. To start the primary data transform logic, click on the preprocess_data function folder within the list of functions.
It reveals:
-
2 script file as
preprocess_data.py. -
3 Give the command you need to use in the terminal to run
transform_data.py. -
4 Shows the input command to be edited in the terminal.
-
Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit pre-process data.
To ensure accurate outputs from the preprocess_data function, users must provide inputs in the following formats:
-
Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).
- Example:
-1.29, 36.82(Nairobi, Kenya).
- Example:
-
Time period: Start and end years must be provided.
- Example:
start 1991-01-01 --end 2020-12-31.
- Example:
-
Data source: Source of data must be provided.
- Example:
era_5,nasa_power,nex_gddp.
- Example:
.png)
preprocess_data Function Output
The preprocess_data function returns the converted data values, with transformed variable names and adjusted units.
Output Structure:
- Data is returned as a DataFrame
- Variable names change to internal variable names
- Units are changed to standard units
All raw data ingested by the toolkit is pre-processed and transformed into a standardized internal format to ensure consistency across modules.
Raw Variable (era_5) |
Internal Variable Name | Unit | Description |
|---|---|---|---|
total_precipitation(m) |
Precipitation |
mm | Total daily precipitation |
maximum_2m_air_temperature(K) |
max_temperature |
°C | Maximum 2-meter air temperature |
minimum_2m_air_temperature(K) |
min_temperature |
°C | Minimum 2-meter air temperature |
.png)
To get a better understanding of the module and associated functions, you can view the following code:
python code:
Module 2: climatology (Location Profiling)
Goal: Establish a Climatological Standard Normal for a specific location.
- Workflow: Period Filtering (continuous 30-year: January 1, 1991, to December 31, 2020) $\rightarrow$ Temporal Averaging (Calculates the Annual Mean Temperature and Annual Total Precipitation for each of the 30 years) $\rightarrow$ Normal Calculation (Computes the multi-year mean of those 30 annual values).
10. Expanding the climatology folder reveals the script file as long_term_climatology.py.
11. To execute the long-term climatology logic, click on the long_term_climatology.py to open the file for editing
-
2 script file as
long_term_climatology.py. -
3 Give the command you need to use in the terminal to run
long_term_climatology.py. -
4 Shows the input command to be edited in the terminal.
-
Workflow: User inputs coordinates $\rightarrow$ User inputs time period $\rightarrow$ User inputs data source $\rightarrow$ Toolkit calculate Climate Normal.
To ensure accurate outputs from the transform_data function, users must provide inputs in the following formats:
-
Spatial Coordinates: Latitude and Longitude must be provided in Decimal Degrees (WGS84).
- Example:
-1.29, 36.82(Nairobi, Kenya).
- Example:
-
Time period: Start and end years must be provided.
- Example:
start-year 1991 --end-year 2020.
- Example:
-
Data source: Source of data must be provided.
- Example:
era_5,nasa_power,nex_gddp.
- Example:
.png)
climatology Module Output
The climatology module returns the calculated Climatological Standard Normals, providing a structured summary report of location-specific average precipitation and temperature metrics.
Output Structure:
- Data is returned as a structured data object
- Variable names are standardized to internal toolkit names
- Units are standardized to SI-compliant units:
- Precipitation is in millimeters (mm) and millimeters per day (mm/day).
- Temperature is in degrees Celsius (°C).
.png)
To get a better understanding of the module and associated functions, you can view the following code:
python code:
Module 3: compare_datasets (Data Interoperability & Validation)
Goal: Quantify the consistency, accuracy, and biases between different climate datasets.
- Workflow: Identifies discrepancies and similarities between input datasets using statistical methods.
The screenshot below help for raw data easy access
To get a better understanding of the module and associated functions, you can view the following code:
Module 4: season_analysis (Growth Windows)
Goal:
- Workflow:
Module 5: climate_statistics (Baseline Integrity)
Goal:
- Workflow:
Module 6: calculate_hazards (Extreme Events)
Goal:
- Workflow:
Module 7: compare_period ()
Goal:
- Workflow: