C48_gsienkf_atmDA gdas_prep Error Analysis PR4555 - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki
CI Error Analysis: PR #4555 — gdas_prep Failure (declare_from_tmpl: command not found)
Job: gdas_prep (rocoto job card dev/job_cards/rocoto/prep.sh)
CI Test Case: C48_gsienkf_atmDA (PR_4555_C48_gsienkf_atmDA)
Platform: URSA (SST Innovation Center)
Commit: 733bbf3e (branch feature/de-template_com)
Error Code: 127 (command not found)
Date: February 2026
Source Gist: https://gist.github.com/emcbot/0fbab0d6e8182861e9a559119b65c713
PR: NOAA-EMC/global-workflow#4555 — "De-template COM declarations"
Executive Summary
The gdas_prep job failed with exit code 127 because the shell function declare_from_tmpl was not found at prep.sh line 152. PR #4555 aims to remove all calls to declare_from_tmpl and replace them with explicit declare statements, but the conversion was incomplete — unconverted calls remain inside the DOENKFONLY_ATM conditional block of prep.sh. Since DOENKFONLY_ATM=YES is only set in enkf-enabled test cases like C48_gsienkf_atmDA, this code path was likely missed during initial testing.
Error Chain
| Step | Source | Line | Event | Severity |
|---|---|---|---|---|
| 1 | jjob_header.sh |
95 | setpdy.sh fails — COMROOT/date/t00z not found |
WARNING (caught by || true) |
| 2 | jjob_header.sh |
96 | source ./PDY fails — PDY file never created |
WARNING (caught by || true) |
| 3 | prep.sh |
31–37 | COM declarations execute successfully (already converted to explicit declare) |
OK |
| 4 | prep.sh |
60 | getdump.sh copies 50+ obs files successfully |
OK |
| 5 | prep.sh |
138–140 | Additional COM declarations execute successfully (converted) | OK |
| 6 | prep.sh |
151 | [ ${DOENKFONLY_ATM} == "YES" ](/TerrenceMcGuinness-NOAA/global-workflow/wiki/-${DOENKFONLY_ATM}-==-"YES"-) evaluates TRUE → enters conditional block |
INFO |
| 7 | prep.sh |
152 | declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_STAT_PREV:COM_ATMOS_HISTORY_TMPL → "command not found" |
FATAL |
| 8 | preamble.sh |
70 | Postamble trap fires with error code 127 | EXIT |
Fatal error message:
/scratch3/.../global-workflow/dev/job_cards/rocoto/prep.sh: line 152: declare_from_tmpl: command not found
End /scratch3/.../prep.sh at 22:22:21 with error code 127 (time elapsed: 00:00:24)
Root Cause Analysis
PR #4555 — Incomplete declare_from_tmpl Removal
PR #4555 by DavidHuber-NOAA resolves issue #4522 by:
- Removing the
declare_from_tmpl()function fromush/bash_utils.sh(or removing it from the sourcing chain) - Replacing all calls with explicit
declare -rx VAR=valuestatements using pre-evaluated COM paths
The conversion was incomplete. Three declare_from_tmpl calls inside the DOENKFONLY_ATM block of prep.sh were not converted:
# prep.sh lines 152-156 (PR branch) — UNCONVERTED calls
if [ ${DOENKFONLY_ATM:-"NO"} == "YES" ](/TerrenceMcGuinness-NOAA/global-workflow/wiki/-${DOENKFONLY_ATM:-"NO"}-==-"YES"-); then
MEMDIR="ensstat" RUN="enkf${GDUMP}" YMD=${gPDY} HH=${gcyc} \
declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_STAT_PREV:COM_ATMOS_HISTORY_TMPL # <-- FAILS
MEMDIR="mem001" RUN="enkf${GDUMP}" YMD=${gPDY} HH=${gcyc} \
declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_MEM001_PREV:COM_ATMOS_HISTORY_TMPL # <-- would fail
RUN="gdas" YMD=${gPDY} HH=${gcyc} \
declare_from_tmpl -rx COMOUT_ATMOS_HISTORY_DET_PREV:COM_ATMOS_HISTORY_TMPL # <-- would fail
fi
Why This Only Fails in C48_gsienkf_atmDA
The DOENKFONLY_ATM variable is set to YES only when GSI EnKF-only atmospheric DA is configured. In config.base:
export DOENKFONLY_ATM=YES # Only for enkf test cases
Most other CI test cases (e.g., C48_atm, C96_atm3DVar) have DOENKFONLY_ATM=NO or unset, so this code path is never reached. This is why the failure ONLY appeared in the C48_gsienkf_atmDA test case.
Secondary Issue: Missing COMROOT/date/t00z
sed: can't read .../RUNTESTS/COMROOT/date/t00z: No such file or directory
jjob_header.sh: line 96: ./PDY: No such file or directory
The setpdy.sh script tried to read COMROOT/date/t00z which doesn't exist. This is a known CI infrastructure issue where the date reference file hasn't been created yet. It's caught by || true in jjob_header.sh and is NOT the cause of the fatal failure. PDY is set from the job scheduler environment, so downstream processing continues regardless.
Execution Flow (MCP-Verified)
prep.sh
├── source load_modules.sh → Load Spack/Lmod modules (78 modules)
├── source jjob_header.sh -e "prep" -c "base prep"
│ ├── source preamble.sh → Shell settings, trap setup
│ │ └── source bash_utils.sh → [FUNCTION REMOVED BY PR #4555]
│ ├── setpdy.sh → WARN: COMROOT/date/t00z missing (caught)
│ ├── source config.base → Machine=URSA, APP=ATM, DOENKFONLY_ATM=YES
│ ├── source config.prep → Prep-specific settings
│ └── source URSA.env → Platform launcher config
├── declare -rx COMIN_OBS=... → [OK] Converted explicit COM declarations
├── declare -rx COMOUT_OBS=... → [OK]
├── getdump.sh → [OK] Copy 50+ obs BUFR files
├── cpfs syndata.tcvitals → [OK]
├── declare -rx COMIN_ATMOS_HISTORY_GFS=... → [OK] Converted
└── if DOENKFONLY_ATM == YES:
└── declare_from_tmpl -rx ... → [FATAL] command not found (exit 127)
Recommendations
1. Complete the declare_from_tmpl Conversion in prep.sh
Replace the three unconverted calls in the DOENKFONLY_ATM block with explicit declarations. Template expansion uses COM_ATMOS_HISTORY_TMPL='${ROTDIR}/${RUN}.${YMD}/${HH}/${MEMDIR}/model/atmos/history':
# BEFORE (broken):
MEMDIR="ensstat" RUN="enkf${GDUMP}" YMD=${gPDY} HH=${gcyc} \
declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_STAT_PREV:COM_ATMOS_HISTORY_TMPL
MEMDIR="mem001" RUN="enkf${GDUMP}" YMD=${gPDY} HH=${gcyc} \
declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_MEM001_PREV:COM_ATMOS_HISTORY_TMPL
RUN="gdas" YMD=${gPDY} HH=${gcyc} \
declare_from_tmpl -rx COMOUT_ATMOS_HISTORY_DET_PREV:COM_ATMOS_HISTORY_TMPL
# AFTER (fixed):
declare -rx COMIN_ATMOS_HISTORY_ENS_STAT_PREV="${ROTDIR}/enkf${GDUMP}.${gPDY}/${gcyc}/ensstat/model/atmos/history"
declare -rx COMIN_ATMOS_HISTORY_ENS_MEM001_PREV="${ROTDIR}/enkf${GDUMP}.${gPDY}/${gcyc}/mem001/model/atmos/history"
declare -rx COMOUT_ATMOS_HISTORY_DET_PREV="${ROTDIR}/gdas.${gPDY}/${gcyc}/model/atmos/history"
2. Audit ALL Job Cards for Remaining declare_from_tmpl Calls
Run a comprehensive grep across all job cards to ensure no calls remain:
grep -rn 'declare_from_tmpl' dev/job_cards/ dev/jobs/ scripts/ jobs/
3. Add EnKF Test Coverage to PR Validation
The DOENKFONLY_ATM code path was missed because it only activates in enkf-specific test cases. Consider adding a CI check list:
- Verify all conditional blocks containing
declare_from_tmplare identified pre-merge - Run
C48_gsienkf_atmDAexplicitly for PRs that modify COM declaration infrastructure
4. Address the setpdy.sh Warning
While not fatal, the missing COMROOT/date/t00z generates noise in logs and masks real errors. The CI harness should either:
- Pre-create the date reference file, or
- Have
setpdy.shusePDYfrom the environment directly when the file is missing
MCP Tool Scorecard
| # | Tool | Arguments | Return Value Summary | Usefulness |
|---|---|---|---|---|
| 1 | get_job_details |
JGDAS_PREP |
"Not found in dev/jobs/" — rocoto card, not J-Job | LOW |
| 2 | describe_component |
ush/declare_from_tmpl |
Not found (it's a function, not a file) | LOW |
| 3 | find_callers_callees |
declare_from_tmpl |
0 callers, 0 callees (Neo4j doesn't track shell function defs) | LOW |
| 4 | search_issues |
PR 4555 |
Found PR #4555 "De-template COM declarations" with CI-Ursa-Failed label | HIGH |
| 5 | get_pull_requests |
(none) | Found PR #4555 details, branch, author, description | HIGH |
| 6 | trace_execution_path |
prep.sh |
Traced call chain: prep.sh → preamble.sh → bash_utils.sh | MEDIUM |
| 7 | describe_component |
ush/preamble.sh |
Confirmed file exists, 5797 bytes | LOW |
| 8 | describe_component |
dev/job_cards/rocoto/prep.sh |
Full content preview with first 50 lines | MEDIUM |
| 9 | find_dependencies |
ush/bash_utils.sh |
Confirmed: preamble.sh IMPORTS bash_utils.sh → jjob_header.sh | HIGH |
Summary: 9 MCP tool calls — 3 HIGH (33%), 2 MEDIUM (22%), 4 LOW (44%), 0 FAILED. Key value from search_issues and get_pull_requests which immediately identified the PR's intent and failure label.
Related PRs and Issues
| # | Type | Title | Relevance |
|---|---|---|---|
| #4555 | PR | De-template COM declarations | This PR — source of the failure |
| #4522 | Issue | Remove declare_from_tmpl |
Resolved by #4555 |
Environment Details
| Property | Value |
|---|---|
| Platform | URSA (SST Innovation Center) |
| Scheduler | Slurm (job 9098312 → subtask 3157550) |
| Commit | 733bbf3e |
| Branch | feature/de-template_com |
| Test Case | C48_gsienkf_atmDA |
| Module Stack | 78 modules (spack-stack 1.9.2, oneapi 2024.2.1) |
| App Config | APP=ATM, MODE=cycled, DOENKFONLY_ATM=YES |
| Analysis Date | 2024-02-24 00z cycle |
| Wall Time | 00:00:24 (failed early) |
Analysis performed using EIB MCP-RAG GraphRAG toolset (9 tool calls). Report generated February 2026.