PR4327_C96C48_hybatmDA enkfgdas_esfc_GCYCLE_DATE_UNBOUND - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki
PR #4327 C96C48_hybatmDA — enkfgdas_esfc gcycle_date: unbound variable Failure
Test Case: C96C48_hybatmDA (Hybrid Atmospheric Data Assimilation)
Job: JGLOBAL_ENKF_SFC → exglobal_enkf_sfc.sh → global_cycle.sh
Platform: Hercules (node hercules-02-55)
PR: #4327 — "Fix two bugfixes to global_cycle.sh"
Author: ClaraDraper-NOAA
Commit: a05fd274
Date: December 16, 2025 16:10–16:11 CST
Log Source: Gist — enkfgdas_esfc.log
Status: RESOLVED (subsequent PR commits fixed the issue; CI-Hercules-Passed after fix)
Analysis Date: February 10, 2026
Executive Summary
The enkfgdas_esfc (EnKF surface update) job failed during CI testing of PR #4327 on Hercules with:
global_cycle.sh: line 281: gcycle_date: unbound variable
FATAL ERROR: Failed to update surface fields! RETURN CODE 1
Root Cause: PR #4327 modified global_cycle.sh to use a new variable gcycle_date for time extraction in the NAMCYC namelist (to fix the time to match the restart time instead of the analysis time), but the initial commit (a05fd274) did not include the corresponding export gcycle_date= in the calling script exglobal_enkf_sfc.sh. Under bash set -u (nounset), the unset variable triggered an immediate abort.
Classification: Code Bug — Incomplete Cross-Script Variable Interface Update
Failure Chain
JGLOBAL_ENKF_SFC (J-Job)
│
├── jjob_header.sh → Sources config.base, config.esfc, HERCULES.env
│ └── setpdy.sh → WARNING: COMROOT/date/t00z missing (non-fatal)
│ └── ./PDY → WARNING: No such file (non-fatal)
│
└── exglobal_enkf_sfc.sh
│
├── regrid_gsiSfcIncr_to_tile.sh ✅ exit 0 (3 seconds)
│ ├── run_mpmd.sh (cmdfile_in) ✅ exit 0 (cpreq orog/grid files)
│ ├── regridStates.x (srun -n 12) ✅ exit 0 (soil/snow increments)
│ └── run_mpmd.sh (cmdfile_out) ✅ exit 0 (cpfs output tiles)
│
├── [MISSING: export gcycle_date=...] ⚠️ NOT EXECUTED
│
└── global_cycle.sh ❌ exit 1
└── line 281: ${gcycle_date:0:4} → "unbound variable"
└── err_exit "Failed to update surface fields!" RETURN CODE 1
Detailed Analysis
1. Primary Error: gcycle_date: unbound variable
Location: global_cycle.sh line 281
Error Type: bash set -u (nounset) violation
The failing code in global_cycle.sh extracts date components from gcycle_date:
# global_cycle.sh lines 281-284
iy=${gcycle_date:0:4} # year
im=${gcycle_date:4:2} # month
id=${gcycle_date:6:2} # day
ih=${gcycle_date:8:2} # hour
These are used in the NAMCYC namelist (fort.36) passed to the global_cycle Fortran executable:
&NAMCYC
idim=48, jdim=48, lsoil=4,
iy=${iy}, im=${im}, id=${id}, ih=${ih}, fh=0,
...
/
Why PR #4327 introduced gcycle_date: The PR fixes issue #4326 — the surface analysis namelist was using the analysis time (PDY/cyc) when it should use the restart time. The variable gcycle_date was introduced as a caller-provided override to pass the correct time.
2. Missing Variable Export in Calling Script
MCP Tool find_env_dependencies("gcycle_date") revealed:
| Script | Role | Line | Value |
|---|---|---|---|
exglobal_enkf_sfc.sh |
Exporter | 213 | ${bPDY}${bcyc} (IAU beginning-of-window time) |
exglobal_enkf_sfc.sh |
Exporter | 291 | ${PDY}${cyc} (analysis center time) |
exglobal_atmos_sfcanl.sh |
Exporter | 158 | ${gcycle_dates[hr]} (per-hour in loop) |
global_cycle.sh |
Consumer | 281 | ${gcycle_date:0:4} (substring extraction) |
In the initial commit a05fd274, the export gcycle_date= lines at positions 213 and 291 of exglobal_enkf_sfc.sh were not yet present. The PR only modified global_cycle.sh to consume the variable without updating the caller to provide it.
The variable was later added in a follow-up commit, after which CI-Hercules passed.
3. Execution Context
Config loaded from log:
| Parameter | Value | Source |
|---|---|---|
CASE |
C48 (ensemble resolution) | config.base line 207 |
CASE_ENS |
C48 | config.base line 207 |
NMEM_ENS |
2 (CI runs with 2 members) | config |
DOIAU_ENKF |
YES | config.base line 356 |
IAUFHRS_ENKF |
3,6,9 | config.base line 357 |
GCYCLE_DO_SOILINCR |
.false. | config.esfc line 37 |
DO_GSISOILDA |
NO | default |
DONST |
YES | config |
APP |
ATM | config.base line 156 |
spack-stack |
1.9.2 (ue-oneapi-2024.1.0) | modules |
APRUN_CYCLE |
srun -l --export=ALL --hint=nomultithread -n 12 --cpus-per-task=1 |
HERCULES.env |
Job timing:
- Start: 16:10:57 CST
- MPMD cmdfile_in: 16:11:05–16:11:08 (3s)
- regridStates.x: completed successfully
- MPMD cmdfile_out: 16:11:10–16:11:11 (1s)
- global_cycle.sh: 16:11:13 — FATAL ERROR
- Total wall time: ~16 seconds before failure
4. Pre-Existing Warnings (Non-Fatal)
| Line | Warning | Impact |
|---|---|---|
| 104 | sed: can't read .../COMROOT/date/t00z: No such file or directory |
Non-fatal — setpdy.sh, handled by || true |
| 107 | ./PDY: No such file or directory |
Non-fatal — jjob_header.sh, handled by || true |
| 929 | Previous cycle snow file gdas.t18z.snogrb_t1534.3072.1536 missing |
Non-fatal — snow update disabled (FSNOL=99999.,FSNOS=99999.) |
| 1336 | I_MPI_EXTRA_FILESYSTEM_LIST environment variable is not supported |
Non-fatal — Intel MPI informational |
5. MCP Tool Analysis Summary
| MCP Tool | Key Finding |
|---|---|
get_job_details("JGLOBAL_ENKF_SFC") |
71-line J-Job, configs: base + esfc, calls exglobal_enkf_sfc.sh via ENKFRESFCSH |
describe_component("exglobal_enkf_sfc.sh") |
11,432 bytes, ensemble surface analysis on tiles, calls regrid + global_cycle |
describe_component("global_cycle.sh") |
14,729 bytes, "pull script into global-workflow" refactor 2025-07-08 by Friedman |
get_code_context("global_cycle") |
Depends on: ftst_land_increments, ftst_read_increments, global_cycle_lib |
find_env_dependencies("gcycle_date") |
2 exporters, 2 consumers — exglobal_enkf_sfc.sh (line 213, 291), exglobal_atmos_sfcanl.sh (line 158), global_cycle.sh (line 281) |
find_callers_callees("global_cycle") |
Entry point (0 callers in graph), leaf function (0 callees) |
trace_execution_path("JGLOBAL_ENKF_SFC") |
15 env dependencies: GDUMP_ENS, GDATE, ENKFRESFCSH, CASE_ENS, assim_freq, etc. |
find_dependencies("exglobal_enkf_sfc.sh") |
Exports: OMP_NUM_THREADS_CY, CYCLEXEC, FNACNA, CASE_IN, err → global_cycle.sh at hop 2 |
search_pull_requests("4327") |
Merged 2025-12-18, CI-Hercules-Passed (after fix), CI-Gaeac6-Failed (separate issue) |
search_ee2_standards("unbound variable...") |
EE2 requires descriptive error messages with "FATAL ERROR:" prefix — compliant |
get_operational_guidance(...) |
EE2 standards: recovery capability for jobs >15min, separate post-processing jobs |
Root Cause Assessment
Primary: Incomplete Cross-Script Variable Interface Change
PR #4327 introduced gcycle_date as a new caller-to-callee variable contract between:
- Callers:
exglobal_enkf_sfc.sh,exglobal_atmos_sfcanl.sh - Callee:
global_cycle.sh
The initial commit (a05fd274) only updated the callee (global_cycle.sh) to consume gcycle_date but did not update all callers to export it. This is a classic interface contract violation — the producer side of the contract was not updated when the consumer side was changed.
Contributing: bash set -u Enforcement
The global-workflow uses set -u (nounset) via preamble.sh, which correctly catches unset variables as errors. This is the intended behavior — set -u successfully prevented a silent failure where iy, im, id, ih would have been empty strings, leading to malformed NAMCYC namelist and potentially incorrect Fortran execution.
Why This Was NOT Caught Earlier
exglobal_atmos_sfcanl.sh(the non-EnKF caller) already hadgcycle_dateexports at the time of the PR, so thesfcanlpath would not have failed- The EnKF path (
exglobal_enkf_sfc.sh) is only exercised by theC96C48_hybatmDACI test, not the simplerC48_ATMtests - The original PR commit only changed
global_cycle.sh, creating an asymmetric update
Recommendations
Immediate (Applied — PR #4327 Fixed)
- Add
export gcycle_datetoexglobal_enkf_sfc.sh— Lines 213 and 291 now export the variable before calling${CYCLESH}. ✅ Applied in subsequent commit.
Short-Term
-
Add variable validation at entry to
global_cycle.sh— Check required variables at script start:# At top of global_cycle.sh, after variable initializations : "${gcycle_date:?ERROR: gcycle_date must be set by caller}"This provides a clearer error message than the generic "unbound variable" from
set -u. -
Add shellcheck CI step for cross-script variable usage —
shellcheckwithsource=directives can detect unset variable usage across sourced/called scripts. -
Document the variable contract — Add a comment block in
global_cycle.shlisting required caller-provided variables:# Required from caller: gcycle_date (YYYYMMDDHH format) # - exglobal_enkf_sfc.sh: exports as ${bPDY}${bcyc} or ${PDY}${cyc} # - exglobal_atmos_sfcanl.sh: exports as ${gcycle_dates[hr]}
Long-Term
-
Implement interface tests for shell script variable contracts — Create a test that verifies all callers of
global_cycle.shexport all required variables before invocation. -
Consider passing
gcycle_dateas a function argument — Refactorglobal_cycle.shto accept the date as a positional argument rather than relying on environment variable inheritance, reducing the risk of missing exports. -
Expand CI test matrix — Ensure that both IAU (
DOIAU_ENKF=YES) and non-IAU paths are tested in at least one CI case, as the two paths have differentgcycle_datevalues (bPDY/bcycvsPDY/cyc).
Affected Files
| File | Role | Lines |
|---|---|---|
ush/global_cycle.sh |
Consumer (changed in PR) | 281–284: gcycle_date substring extraction |
scripts/exglobal_enkf_sfc.sh |
Exporter (missing in initial commit) | 213: ${bPDY}${bcyc}, 291: ${PDY}${cyc} |
scripts/exglobal_atmos_sfcanl.sh |
Exporter (already present) | 158: ${gcycle_dates[hr]} |
dev/jobs/JGLOBAL_ENKF_SFC |
J-Job wrapper | Calls exglobal_enkf_sfc.sh |
ush/regrid_gsiSfcIncr_to_tile.sh |
Upstream step | Succeeded (exit 0) |
config.esfc |
Config | GCYCLE_DO_SOILINCR=.false. |
Related Issues
| Issue | Title | Relevance |
|---|---|---|
| #4326 | global_cycle time mismatch | Direct — PR #4327 resolves this |
| #4325 | Soil moisture relaxation default | Direct — PR #4327 resolves this |
| #4327 | Fix two bugfixes to global_cycle.sh | This PR — merged 2025-12-18 |
Log Evidence Summary
| Log Line | Content | Significance |
|---|---|---|
| 68 | Begin JGLOBAL_ENKF_SFC at 16:10:57 |
Job start |
| 104 | sed: can't read .../COMROOT/date/t00z |
setpdy.sh warning (non-fatal) |
| 107 | ./PDY: No such file or directory |
jjob_header.sh warning (non-fatal) |
| 929 | WARNING: Previous cycle snow file ... is missing |
Snow update disabled for this cycle |
| 1293 | End run_mpmd.sh at 16:11:08 with error code 0 |
MPMD input staging succeeded |
| 1704 | End run_mpmd.sh at 16:11:11 with error code 0 |
MPMD output copy succeeded |
| 1914 | global_cycle.sh: line 281: gcycle_date: unbound variable |
PRIMARY FAILURE |
| 1918 | err_exit 'Failed to update surface fields!' |
Error handler invoked |
| 1922 | FATAL ERROR: Failed to update surface fields! RETURN CODE 1 |
EE2-compliant error message |
| 1923 | ABNORMAL EXIT at Tue Dec 16 10:11:13 CST 2025 on hercules-02-55 |
Job abort |
| 1971 | Failed to update surface fields! RETURN CODE 1 |
Final error propagation |
Generated by EIB MCP-RAG analysis using 15+ MCP tools against global-workflow knowledge base (60,404 docs, 484,901 relationships)