Pre‐processed data - Rwema25/AE-project GitHub Wiki
This document explains the water balance calculation code used for estimating daily soil water availability and the Number of Soil Water Stress Days (NDWS). It breaks down the inputs, constants, functions, and formulas and illustrates how they connect.
The process calculates daily maximum potential evapotranspiration (ETmax) based on climate data, updates soil water availability using rainfall and evaporation demand, computes the ratio of actual to potential evapotranspiration (ERATIO), and finally counts the number of water stress days (NDWS) for spatial raster data over Africa.
| Variable | Description | Source/Value |
|---|---|---|
srad |
Solar radiation (MJ/m²/day) | Raster input |
tmin, tmean, tmax
|
Daily min, mean, max temperature (°C) | Raster input |
soilcp |
Soil water holding capacity (mm) | Soil raster input |
soilsat |
Soil saturation capacity (mm) | Soil raster input |
AVAIL |
Available soil water from previous day (mm) | Raster updated daily |
| Constant | Value | Description |
|---|---|---|
albedo |
0.2 | Fraction of solar radiation reflected by surface |
vpd_cte |
0.7 | Empirical adjustment constant for VPD |
a (a_eslope) |
611.2 | Constant in saturation vapor pressure slope formula |
b (b_eslope) |
17.67 | Constant in saturation vapor pressure exponential term |
c (c_eslope) |
243.5 | Constant in denominator of saturation vapor pressure formula |
pt_const |
1.26 | Priestley-Taylor constant for wet surface evaporation |
pt_fact |
1 | Adjustment factor for Priestley-Taylor coefficient |
vpd_ref |
1 | Reference value for VPD scaling |
psycho |
62 | Psychrometric constant (kPa/°C) |
rho_w |
997 | Density of water (kg/m³) |
rlat_ht |
2.26E6 | Latent heat of vaporization of water (J/kg) |
The function peest2 calculates potential evapotranspiration (ETmax) using an adapted Priestley-Taylor equation with vapor pressure deficit (VPD) adjustment and raster input data.
Formula for ETmax:
where:
-
$( pt_{\text{coef}} )$ is the Priestley-Taylor coefficient adjusted for vapor pressure deficit (VPD):
-
$( R_n )$ is net radiation available for evapotranspiration:
-
$( \Delta )$ is the slope of the saturation vapor pressure curve at mean air temperature ( T_{\text{mean}} ):
-
$( \gamma )$ is the psychrometric constant. -
Conversion factor translates energy flux into millimeters of evaporated water considering physical constants.
Calculating saturation vapor pressure
Where
Calculating vapor pressure deficit (VPD):
-
$(vpd_{\text{cte}})$ is an empirical constant for scaling VPD (set to 0.7).
This function updates the soil water availability,
Step 1: Calculate actual-to-potential evapotranspiration ratio (ERATIO)
- Ensures the percentage of available soil water relative to soil capacity is within bounds.
- Controls how much of the potential evapotranspiration is realized, limiting ET when soil water is scarce.
Step 2: Calculate daily water demand (actual evapotranspiration):
-
$( ET_{\max} )$ is the potential evapotranspiration for the day (frompeest2). -
$( Demand )$ is the actual water use by plants, moderated by soil water availability.
Step 3: Update soil water availability for day (t):
-
$(A(t-1))$ : soil water availability on previous day. -
$(P)$ : rainfall input (mm) for the day. - Ensures soil water availability never exceeds soil capacity or drops below zero.
| Variable | Description |
|---|---|
| Daily rainfall input (mm) | |
| Soil water availability on day |
|
| Soil water availability from previous day (mm) | |
| Maximum soil water holding capacity (mm) | |
| Actual to potential evapotranspiration ratio (dimensionless) | |
| Actual daily evapotranspiration (mm) | |
| Potential daily evapotranspiration (mm) |
This code was used in the adaptation atlas and can be accessed here Link
## Number of soil water stress days (NDWS)
## By: H. Achicanoy & A. Mendez
## April, 2025
# R options
args <- commandArgs(trailingOnly = T)
options(warn = -1, scipen = 999) # Remove warning alerts and scientific notation
suppressMessages(library(pacman))
suppressMessages(pacman::p_load(tidyverse,terra,gtools,lubridate,compiler,raster,ncdf4))
peest2 <- function(srad, tmin, tmean, tmax){
# Convert rasters to matrices for faster processing
srad_m <- raster::values(srad) # this variable comes in .nc format
tmin_m <- terra::values(tmin)
tmean_m <- terra::values(tmean)
tmax_m <- terra::values(tmax)
# Constants
albedo <- 0.2
vpd_cte <- 0.7
a_eslope <- 611.2
b_eslope <- 17.67
c_eslope <- 243.5
# Pre-allocate matrices
n_cells <- ncol(srad_m)
n_layers <- nrow(srad_m)
rn <- matrix(0, nrow = n_layers, ncol = n_cells)
eslope <- matrix(0, nrow = n_layers, ncol = n_cells)
et_max <- matrix(0, nrow = n_layers, ncol = n_cells)
# Optimized calculations
rn <- (1-albedo) * srad_m
rm(srad_m); gc()
temp_denom <- tmean_m + c_eslope
eslope <- (a_eslope * b_eslope * c_eslope * exp(b_eslope * tmean_m/temp_denom)) / (temp_denom^2)
rm(tmean_m, temp_denom); gc()
vpd <- vpd_cte * (0.61120 * (exp((17.67*tmax_m)/(tmax_m+243.5)) -
exp((17.67*tmin_m)/(tmin_m+243.5))))
rm(tmin_m, tmax_m); gc()
# Constants
pt_const <- 1.26
pt_fact <- 1
vpd_ref <- 1
psycho <- 62
rho_w <- 997
rlat_ht <- 2.26E6
# Final calculation
pt_coef <- 1 + (pt_fact*pt_const-1) * vpd / vpd_ref
conversion_factor <- 1E6 * 100 / (rlat_ht * rho_w) * 10
et_max <- pt_coef * rn * eslope/(eslope+psycho) * conversion_factor
# Convert back to raster
et_max_rast <- tmin
terra::values(et_max_rast) <- et_max
return(et_max_rast)
}
root <- '/home/jovyan/common_data'
ref <- terra::rast(paste0(root,'/atlas_hazards/roi/africa.tif'))
# Soil variables
scp <- terra::rast(paste0(root,'/atlas_hazards/soils/africa_scp.tif'))
scp <- scp |> terra::resample(ref) |> terra::mask(ref)
sst <- terra::rast(paste0(root,'/atlas_hazards/soils/africa_ssat.tif'))
sst <- sst |> terra::resample(ref) |> terra::mask(ref)
# Calculate NDWS function
calc_ndws <- function(yr, mn){
outfile <- paste0(out_dir,'/NDWS-',yr,'-',mn,'.tif')
cat(outfile,'\n')
if (!file.exists(outfile)) {
dir.create(dirname(outfile),F,T)
# Last day of the month
last_day <- lubridate::days_in_month(as.Date(paste0(yr,'-',mn,'-01')))
# Sequence of dates
if(as.numeric(yr) > 2020 & mn == '02'){
dts <- seq(from = as.Date(paste0(yr,'-',mn,'-01')), to = as.Date(paste0(yr,'-',mn,'-28')), by = 'day')
} else {
dts <- seq(from = as.Date(paste0(yr,'-',mn,'-01')), to = as.Date(paste0(yr,'-',mn,'-',last_day)), by = 'day')
}
cat(">>> Iniciando proceso:", yr, "-", mn, " \n")
pr_fls <- paste0(pr_pth,'/chirps-v2.0.',gsub(pattern='-', replacement='.', x=dts, fixed=T),'.tif')
pr_fls <- pr_fls[file.exists(pr_fls)]
tx_fls <- paste0(tx_pth,'/',yr,'/Tmax.',gsub(pattern='-', replacement='.', x=dts, fixed=T),'.tif')
tx_fls <- tx_fls[file.exists(tx_fls)]
tm_fls <- paste0(tm_pth,'/',yr,'/Tmin.',gsub(pattern='-', replacement='.', x=dts, fixed=T),'.tif')
tm_fls <- tm_fls[file.exists(tm_fls)]
sr_fls <- paste0(sr_pth,'/Solar-Radiation-Flux_C3S-glob-agric_AgERA5_',gsub(pattern='-', replacement='', x=dts, fixed=T),'_final-v1.0.nc')
sr_fls <- sr_fls[file.exists(sr_fls)]
# Read variables
prc <<- terra::rast(pr_fls)
tmx <<- terra::rast(tx_fls)
tmn <<- terra::rast(tm_fls)
tav <<- (tmx + tmn)/2
srd <<- raster::stack(sr_fls)
if (yr <= 2020) {srd <- srd/1000000}
# Maximum evapotranspiration
ETMAX <<- peest2(srad = srd, tmin = tmn, tmean = tav, tmax = tmx)
rm(list = c('tmn','tmx','tav','srd','sptd'))
# Compute water balance model
date <- paste0(yr,'-',mn)
if(date %in% c('1995-01','2021-01','2041-01','2061-01','2081-01')){
AVAIL <<- ref
AVAIL[!is.na(AVAIL)] <- 0
} else {
avail_fl <- list.files(path = dirname(outfile), pattern = 'AVAIL-')
avail_fl <- avail_fl[grep(pattern = '\\.tif', avail_fl)]
avail_fl <- avail_fl[length(avail_fl)]
AVAIL <<- terra::rast(paste0(dirname(outfile),'/',avail_fl))
AVAIL <<- AVAIL[[terra::nlyr(AVAIL)]]
}
eabyep_calc <- function(soilcp = scp, soilsat = ssat, avail = AVAIL, rain = prc[[1]], evap = ETMAX[[1]]){
avail <- min(avail, soilcp)
# ERATIO
percwt <- min(avail/soilcp*100, 100)
percwt <- max(percwt, 1)
eratio <- min(percwt/(97-3.868*sqrt(soilcp)), 1)
demand <- eratio * evap
result <- avail + rain - demand
# logging <- result - soilcp
# logging <- max(logging, 0)
# logging <- min(logging, soilsat)
# runoff <- result - logging + soilcp
avail <- min(soilcp, result)
avail <- max(avail, 0)
# runoff <- max(runoff, 0)
out <- list(Availability = c(AVAIL, avail),
# Demand = demand,
Eratio = eratio
# Logging = logging
)
return(out)
}
ceabyep_calc <- compiler::cmpfun(eabyep_calc)
watbal <- vector("list", terra::nlyr(ETMAX))
for(j in 1:terra::nlyr(ETMAX)){
water_balance <- ceabyep_calc(soilcp = scp,
soilsat = sst,
avail = AVAIL[[terra::nlyr(AVAIL)]],
rain = prc[[j]],
evap = ETMAX[[j]])
# Update AVAIL with deep copy to avoid memory leaks
AVAIL <- terra::deepcopy(water_balance$Availability)
# Store result and clean temporary objects
watbal[[j]] <- water_balance
rm(water_balance)
}
ERATIO <- watbal |> purrr::map('Eratio') |> terra::rast()
# Calculate number of soil water stress days
cvls <- matrix(data = c(-Inf, 0.5, 1), ncol = 3) # Classification values
NDWS <- terra::classify(x = ERATIO, rcl = cvls, right = F) |> sum()
terra::writeRaster(NDWS, outfile, overwrite = T)
terra::writeRaster(AVAIL, paste0(dirname(outfile),'/AVAIL-',yr,'-',mn,'.tif'), overwrite = T)
}
}
rm(list = c('tmn','tmx','tav','srd','sptd'))
rm(list = c('prc','ETMAX','AVAIL','watbal','ERATIO','NDWS'))
gc(verbose = F, full = T, reset = T)
#gcm = 'ACCESS-ESM1-5'
#ssp = 'ssp245'
#prd = '2081_2100'
#Rscript NDWS_1.R 'MPI-ESM1-2-HR' 'MRI-ESM2-0'
ssps <- args[1]
gcms <- args[2]
prds <- args[3]
#c('2021_2040','2041_2060','2061_2080','2081_2100')
# c('ACCESS-ESM1-5','EC-Earth3','INM-CM5-0','MPI-ESM1-2-HR','MRI-ESM2-0')
# Future setup
for (gcm in gcms) {
for (ssp in ssps) {
for (prd in prds) {
gc()
cmb <- paste0(ssp,'_',gcm,'_',prd)
prd_num <- as.numeric(unlist(strsplit(x = prd, split = '_')))
yrs <- prd_num[1]:prd_num[2]
mns <- c(paste0('0',1:9),10:12)
stp <- base::expand.grid(yrs, mns) |> base::as.data.frame(); rm(yrs,mns)
names(stp) <- c('yrs','mns')
stp <- stp |>
dplyr::arrange(yrs, mns) |>
base::as.data.frame()
pr_pth <- paste0(root,'/chirps_cmip6_africa/Prec_',gcm,'_',ssp,'_',prd) # Precipitation
tx_pth <- paste0(root,'/chirts_cmip6_africa/Tmax_',gcm,'_',ssp,'_',prd) # Maximum temperature
tm_pth <- paste0(root,'/chirts_cmip6_africa/Tmin_',gcm,'_',ssp,'_',prd) # Minimum temperature
sr_pth <- paste0(root,'/ecmwf_agera5_cmip6_africa/solar_radiation_flux_',gcm,'_',ssp,'_',prd) # Solar radiation
out_dir <- paste0(root,'/atlas_hazards/cmip6/indices/',cmb,'/NDWS')
yrs_mpg <- data.frame(Baseline = as.character(rep(1995:2014, 4)),
Future = as.character(c(2021:2040,2041:2060,
2061:2080,2081:2100)))
#i=1;yr = stp$yrs[i]; mn = stp$mns[i]
1:nrow(stp) |>
purrr::map(.f = function(i){
calc_ndws(yr = stp$yrs[i], mn = stp$mns[i])
if (i%%5 == 0) {
tmpfls <- list.files(tempdir(), full.names = T)
1:length(tmpfls) |>
purrr::map(.f = function(k) {system(paste0("rm -f ", tmpfls[k]))})
}
})
}
}
}- Onset (BEG): Moisture supply ≥ 0.5 ET marks start.
- Cessation (END): Moisture supply falls below 0.5 ET.
- Seasonal length (LGP): Duration from BEG to END, including stored moisture use, defines the growing season length.
Where:
-
$( T_{\text{max}} )$ and$( T_{\text{min}} )$ are the daily maximum and minimum temperatures (°C). -
$( R_a )$ is extraterrestrial radiation (mm/day).
Where:
-
$G_{sc}$ = 0.0820 MJ$m^{-2}min^{-1}$ - All other terms as defined below.
Where:
-
$( J )$ is the day of the year.
Where:
-
$( \phi )$ is the latitude in radians. -
$( \delta )$ is the solar declination.
This code was tested for its ability to identify a season by marking the start and end dates, and the output results can be accessed here Link
import requests
import pandas as pd
import math
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
# ------------ Climate data fetching -------------------------------
def get_climate_data(lat, lon, start_date, end_date, model="MRI_AGCM3_2_S"):
url = "https://climate-api.open-meteo.com/v1/climate"
params = {
"latitude": lat,
"longitude": lon,
"start_date": start_date,
"end_date": end_date,
"models": model,
"daily": "temperature_2m_max,temperature_2m_min,precipitation_sum",
"temperature_unit": "celsius",
"precipitation_unit": "mm",
"timeformat": "iso8601",
"timezone": "auto"
}
response = requests.get(url, params=params)
if response.status_code == 200:
data = response.json()
dates = data['daily']['time']
print(f"Received {len(dates)} days from {start_date} to {end_date}")
df = pd.DataFrame({
"date": pd.to_datetime(dates),
"tmax": data["daily"]["temperature_2m_max"],
"tmin": data["daily"]["temperature_2m_min"],
"precip": data["daily"]["precipitation_sum"],
})
return df
else:
raise Exception(f"API error {response.status_code}: {response.text}")
# ------------ Helper functions for ET0 calculation ----------------
def deg2rad(deg): return deg * math.pi / 180.0
def day_of_year(date): return date.timetuple().tm_yday
def sol_dec(J): return 0.409 * math.sin((2 * math.pi / 365) * J - 1.39)
def inv_rel_dist_earth_sun(J): return 1 + 0.033 * math.cos((2 * math.pi / 365) * J)
def sunset_hour_angle(lat, sol_decl):
val = -math.tan(lat) * math.tan(sol_decl)
val = max(min(val, 1), -1)
return math.acos(val)
def et_rad(lat, sol_decl, sha, ird):
Gsc = 0.0820
ra = ((24 * 60) / math.pi) * Gsc * ird * (
sha * math.sin(lat) * math.sin(sol_decl) +
math.cos(lat) * math.cos(sol_decl) * math.sin(sha)
)
return ra
def hargreaves(tmin, tmax, Ra):
if tmax is None or tmin is None or tmax < tmin:
return None
Tmean = (tmax + tmin) / 2
return 0.0023 * math.sqrt(tmax - tmin) * (Tmean + 17.8) * Ra
# ------------ Calculate ET0 for a period ---------------------------
def calculate_et_for_period(lat, lon, start_date, end_date, model="MRI_AGCM3_2_S"):
df = get_climate_data(lat, lon, start_date, end_date, model)
lat_rad = deg2rad(lat)
et_values = []
for _, row in df.iterrows():
J = day_of_year(row["date"].to_pydatetime())
sol_decl = sol_dec(J)
ird = inv_rel_dist_earth_sun(J)
sha = sunset_hour_angle(lat_rad, sol_decl)
Ra = et_rad(lat_rad, sol_decl, sha, ird)
et0 = hargreaves(row["tmin"], row["tmax"], Ra)
et_values.append(et0)
df["ET0_mm_day"] = et_values
return df
# ------------ Monthly chunks for year -------------------------------
def monthly_chunks(year):
chunks = []
for m in range(1, 13):
start = datetime(year, m, 1)
if m == 12:
end = datetime(year, 12, 31)
else:
end = datetime(year, m + 1, 1) - timedelta(days=1)
chunks.append((start.strftime("%Y-%m-%d"), end.strftime("%Y-%m-%d")))
return chunks
# ------------ Fetch and calculate ET0 for full year -----------------
def fetch_full_year_et(lat, lon, year, model="MRI_AGCM3_2_S"):
all_dfs = []
for start, end in monthly_chunks(year):
print(f"Fetching data for {start} to {end} with model {model}")
df_month = calculate_et_for_period(lat, lon, start, end, model)
all_dfs.append(df_month)
full_df = pd.concat(all_dfs).drop_duplicates(subset="date").reset_index(drop=True)
return full_df.sort_values("date")
# ------------ Detect onset and cessation dates ----------------------
def detect_onset_cessation(df, gap_days=30, min_rainy_period_days=5):
precip = df["precip"].to_numpy()
et0 = df["ET0_mm_day"].to_numpy()
dates = df["date"].to_numpy()
threshold = 0.5 * et0
rainy_flags = precip >= threshold
results = []
i = 0
n = len(df)
while i < n:
if rainy_flags[i]:
onset_date = dates[i]
dry_counter = 0
j = i + 1
cessation_date = None
while j < n:
if not rainy_flags[j]:
dry_counter += 1
if dry_counter >= gap_days:
cessation_date = dates[j - gap_days]
break
else:
dry_counter = 0
j += 1
if cessation_date is None:
cessation_date = None
end_for_duration = cessation_date if cessation_date else dates[-1]
rainy_duration = (pd.to_datetime(end_for_duration) - pd.to_datetime(onset_date)).days + 1
if rainy_duration >= min_rainy_period_days:
results.append({"onset": onset_date, "cessation": cessation_date, "length_days": rainy_duration})
if cessation_date:
try:
idx_after_cess = next(idx for idx, v in enumerate(dates) if v == cessation_date)
i = idx_after_cess + 1
except StopIteration:
i = j
else:
break
else:
i += 1
return results
# ------------ Main execution -----------------------------------------
if __name__ == "__main__":
latitude = -1.2921 # Nairobi latitude
longitude = 36.8219 # Nairobi longitude
year = 1985 # Example year (you can change)
model = "MRI_AGCM3_2_S"
df_et = fetch_full_year_et(latitude, longitude, year, model)
print(df_et.head())
print(f"Total days retrieved: {len(df_et)}")
onset_cessation_pairs = detect_onset_cessation(df_et, gap_days=30, min_rainy_period_days=5)
if not onset_cessation_pairs:
print("No onset detected.")
else:
for idx, pair in enumerate(onset_cessation_pairs, 1):
onset_str = pd.to_datetime(pair["onset"]).strftime('%Y-%m-%d')
cessation_str = pd.to_datetime(pair["cessation"]).strftime('%Y-%m-%d') if pair["cessation"] else "No cessation detected"
length = pair["length_days"]
print(f"Rainy period {idx}: Onset = {onset_str} ; Cessation = {cessation_str} ; Growing season length = {length} days")
#------------------visualisation----------------------
plt.figure(figsize=(12,4))
plt.plot(df_et["date"], df_et["precip"], label="Precipitation (mm)", color="blue")
plt.plot(df_et["date"], df_et["ET0_mm_day"]*0.5, label="0.5 * ET0", color="orange")
for season in onset_cessation_pairs:
onset = pd.to_datetime(season["onset"])
cess = pd.to_datetime(season["cessation"]) if season["cessation"] else df_et["date"].iloc[-1]
plt.axvspan(onset, cess, color="green", alpha=0.3)
plt.legend()
plt.title("Precipitation, 0.5 ET0 and Growing Seasons-Nairobi 1985")