C48_gsienkf_atmDA gdas_prep Error Analysis PR4555 - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki

CI Error Analysis: PR #4555 — gdas_prep Failure (declare_from_tmpl: command not found)

Job: gdas_prep (rocoto job card dev/job_cards/rocoto/prep.sh)
CI Test Case: C48_gsienkf_atmDA (PR_4555_C48_gsienkf_atmDA)
Platform: URSA (SST Innovation Center)
Commit: 733bbf3e (branch feature/de-template_com)
Error Code: 127 (command not found)
Date: February 2026
Source Gist: https://gist.github.com/emcbot/0fbab0d6e8182861e9a559119b65c713
PR: NOAA-EMC/global-workflow#4555 — "De-template COM declarations"


Executive Summary

The gdas_prep job failed with exit code 127 because the shell function declare_from_tmpl was not found at prep.sh line 152. PR #4555 aims to remove all calls to declare_from_tmpl and replace them with explicit declare statements, but the conversion was incomplete — unconverted calls remain inside the DOENKFONLY_ATM conditional block of prep.sh. Since DOENKFONLY_ATM=YES is only set in enkf-enabled test cases like C48_gsienkf_atmDA, this code path was likely missed during initial testing.


Error Chain

Step Source Line Event Severity
1 jjob_header.sh 95 setpdy.sh fails — COMROOT/date/t00z not found WARNING (caught by || true)
2 jjob_header.sh 96 source ./PDY fails — PDY file never created WARNING (caught by || true)
3 prep.sh 31–37 COM declarations execute successfully (already converted to explicit declare) OK
4 prep.sh 60 getdump.sh copies 50+ obs files successfully OK
5 prep.sh 138–140 Additional COM declarations execute successfully (converted) OK
6 prep.sh 151 [ ${DOENKFONLY_ATM} == "YES" ](/TerrenceMcGuinness-NOAA/global-workflow/wiki/-${DOENKFONLY_ATM}-==-"YES"-) evaluates TRUE → enters conditional block INFO
7 prep.sh 152 declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_STAT_PREV:COM_ATMOS_HISTORY_TMPL → "command not found" FATAL
8 preamble.sh 70 Postamble trap fires with error code 127 EXIT

Fatal error message:

/scratch3/.../global-workflow/dev/job_cards/rocoto/prep.sh: line 152: declare_from_tmpl: command not found
End /scratch3/.../prep.sh at 22:22:21 with error code 127 (time elapsed: 00:00:24)

Root Cause Analysis

PR #4555 — Incomplete declare_from_tmpl Removal

PR #4555 by DavidHuber-NOAA resolves issue #4522 by:

  1. Removing the declare_from_tmpl() function from ush/bash_utils.sh (or removing it from the sourcing chain)
  2. Replacing all calls with explicit declare -rx VAR=value statements using pre-evaluated COM paths

The conversion was incomplete. Three declare_from_tmpl calls inside the DOENKFONLY_ATM block of prep.sh were not converted:

# prep.sh lines 152-156 (PR branch) — UNCONVERTED calls
if [ ${DOENKFONLY_ATM:-"NO"} == "YES" ](/TerrenceMcGuinness-NOAA/global-workflow/wiki/-${DOENKFONLY_ATM:-"NO"}-==-"YES"-); then
    MEMDIR="ensstat" RUN="enkf${GDUMP}" YMD=${gPDY} HH=${gcyc} \
        declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_STAT_PREV:COM_ATMOS_HISTORY_TMPL   # <-- FAILS
    MEMDIR="mem001" RUN="enkf${GDUMP}" YMD=${gPDY} HH=${gcyc} \
        declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_MEM001_PREV:COM_ATMOS_HISTORY_TMPL  # <-- would fail
    RUN="gdas" YMD=${gPDY} HH=${gcyc} \
        declare_from_tmpl -rx COMOUT_ATMOS_HISTORY_DET_PREV:COM_ATMOS_HISTORY_TMPL        # <-- would fail
fi

Why This Only Fails in C48_gsienkf_atmDA

The DOENKFONLY_ATM variable is set to YES only when GSI EnKF-only atmospheric DA is configured. In config.base:

export DOENKFONLY_ATM=YES  # Only for enkf test cases

Most other CI test cases (e.g., C48_atm, C96_atm3DVar) have DOENKFONLY_ATM=NO or unset, so this code path is never reached. This is why the failure ONLY appeared in the C48_gsienkf_atmDA test case.

Secondary Issue: Missing COMROOT/date/t00z

sed: can't read .../RUNTESTS/COMROOT/date/t00z: No such file or directory
jjob_header.sh: line 96: ./PDY: No such file or directory

The setpdy.sh script tried to read COMROOT/date/t00z which doesn't exist. This is a known CI infrastructure issue where the date reference file hasn't been created yet. It's caught by || true in jjob_header.sh and is NOT the cause of the fatal failure. PDY is set from the job scheduler environment, so downstream processing continues regardless.


Execution Flow (MCP-Verified)

prep.sh
  ├── source load_modules.sh                          → Load Spack/Lmod modules (78 modules)
  ├── source jjob_header.sh -e "prep" -c "base prep"
  │     ├── source preamble.sh                        → Shell settings, trap setup
  │     │     └── source bash_utils.sh                → [FUNCTION REMOVED BY PR #4555]
  │     ├── setpdy.sh                                 → WARN: COMROOT/date/t00z missing (caught)
  │     ├── source config.base                        → Machine=URSA, APP=ATM, DOENKFONLY_ATM=YES
  │     ├── source config.prep                        → Prep-specific settings
  │     └── source URSA.env                           → Platform launcher config
  ├── declare -rx COMIN_OBS=...                       → [OK] Converted explicit COM declarations
  ├── declare -rx COMOUT_OBS=...                      → [OK]
  ├── getdump.sh                                      → [OK] Copy 50+ obs BUFR files
  ├── cpfs syndata.tcvitals                           → [OK]
  ├── declare -rx COMIN_ATMOS_HISTORY_GFS=...         → [OK] Converted
  └── if DOENKFONLY_ATM == YES:
        └── declare_from_tmpl -rx ...                 → [FATAL] command not found (exit 127)

Recommendations

1. Complete the declare_from_tmpl Conversion in prep.sh

Replace the three unconverted calls in the DOENKFONLY_ATM block with explicit declarations. Template expansion uses COM_ATMOS_HISTORY_TMPL='${ROTDIR}/${RUN}.${YMD}/${HH}/${MEMDIR}/model/atmos/history':

# BEFORE (broken):
MEMDIR="ensstat" RUN="enkf${GDUMP}" YMD=${gPDY} HH=${gcyc} \
    declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_STAT_PREV:COM_ATMOS_HISTORY_TMPL
MEMDIR="mem001" RUN="enkf${GDUMP}" YMD=${gPDY} HH=${gcyc} \
    declare_from_tmpl -rx COMIN_ATMOS_HISTORY_ENS_MEM001_PREV:COM_ATMOS_HISTORY_TMPL
RUN="gdas" YMD=${gPDY} HH=${gcyc} \
    declare_from_tmpl -rx COMOUT_ATMOS_HISTORY_DET_PREV:COM_ATMOS_HISTORY_TMPL

# AFTER (fixed):
declare -rx COMIN_ATMOS_HISTORY_ENS_STAT_PREV="${ROTDIR}/enkf${GDUMP}.${gPDY}/${gcyc}/ensstat/model/atmos/history"
declare -rx COMIN_ATMOS_HISTORY_ENS_MEM001_PREV="${ROTDIR}/enkf${GDUMP}.${gPDY}/${gcyc}/mem001/model/atmos/history"
declare -rx COMOUT_ATMOS_HISTORY_DET_PREV="${ROTDIR}/gdas.${gPDY}/${gcyc}/model/atmos/history"

2. Audit ALL Job Cards for Remaining declare_from_tmpl Calls

Run a comprehensive grep across all job cards to ensure no calls remain:

grep -rn 'declare_from_tmpl' dev/job_cards/ dev/jobs/ scripts/ jobs/

3. Add EnKF Test Coverage to PR Validation

The DOENKFONLY_ATM code path was missed because it only activates in enkf-specific test cases. Consider adding a CI check list:

  • Verify all conditional blocks containing declare_from_tmpl are identified pre-merge
  • Run C48_gsienkf_atmDA explicitly for PRs that modify COM declaration infrastructure

4. Address the setpdy.sh Warning

While not fatal, the missing COMROOT/date/t00z generates noise in logs and masks real errors. The CI harness should either:

  • Pre-create the date reference file, or
  • Have setpdy.sh use PDY from the environment directly when the file is missing

MCP Tool Scorecard

# Tool Arguments Return Value Summary Usefulness
1 get_job_details JGDAS_PREP "Not found in dev/jobs/" — rocoto card, not J-Job LOW
2 describe_component ush/declare_from_tmpl Not found (it's a function, not a file) LOW
3 find_callers_callees declare_from_tmpl 0 callers, 0 callees (Neo4j doesn't track shell function defs) LOW
4 search_issues PR 4555 Found PR #4555 "De-template COM declarations" with CI-Ursa-Failed label HIGH
5 get_pull_requests (none) Found PR #4555 details, branch, author, description HIGH
6 trace_execution_path prep.sh Traced call chain: prep.sh → preamble.sh → bash_utils.sh MEDIUM
7 describe_component ush/preamble.sh Confirmed file exists, 5797 bytes LOW
8 describe_component dev/job_cards/rocoto/prep.sh Full content preview with first 50 lines MEDIUM
9 find_dependencies ush/bash_utils.sh Confirmed: preamble.sh IMPORTS bash_utils.sh → jjob_header.sh HIGH

Summary: 9 MCP tool calls — 3 HIGH (33%), 2 MEDIUM (22%), 4 LOW (44%), 0 FAILED. Key value from search_issues and get_pull_requests which immediately identified the PR's intent and failure label.


Related PRs and Issues

# Type Title Relevance
#4555 PR De-template COM declarations This PR — source of the failure
#4522 Issue Remove declare_from_tmpl Resolved by #4555

Environment Details

Property Value
Platform URSA (SST Innovation Center)
Scheduler Slurm (job 9098312 → subtask 3157550)
Commit 733bbf3e
Branch feature/de-template_com
Test Case C48_gsienkf_atmDA
Module Stack 78 modules (spack-stack 1.9.2, oneapi 2024.2.1)
App Config APP=ATM, MODE=cycled, DOENKFONLY_ATM=YES
Analysis Date 2024-02-24 00z cycle
Wall Time 00:00:24 (failed early)

Analysis performed using EIB MCP-RAG GraphRAG toolset (9 tool calls). Report generated February 2026.