MCP RAG Baseline Test_C96_atm3DVar_extended_Analysis - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki

MCP/RAG Baseline Test: C96_atm3DVar_extended CI Case Analysis

Date: March 20, 2026 Platform: Kiro IDE on EC2 (aarch64) → eib-mcp-gateway v2.0.1 (Docker MCP Gateway) → global-workflow-unified-mcp v3.6.2 MCP Server: 51 tools, ChromaDB (85,995 docs / 6 collections), Neo4j (2,758 files / 2.6M relationships) Purpose: Baseline quality test for porting the MCP/RAG system to AWS Bedrock ecosystem


Question Asked

What is the C96_atm3DVar_extended.yaml CI case about and why is it called "extended"?

Answer Produced

This CI case is a cycled atmosphere-only GFS experiment using 3D-Var data assimilation at C96 resolution (~100km). It is called "extended" for two distinct reasons:

1. Extended Cycling (More DA Cycles)

Standard (C96_atm3DVar) Extended (C96_atm3DVar_extended)
idate 2021122018 2021122018
edate 2021122106 2021122118
interval 24 hours 6 hours
Cycles run ~1 (24h gap) ~4 (6h intervals over 24h)

The standard case runs roughly one cycle with a 24-hour interval. The extended case runs 4 cycles at 6-hour intervals (the operational GFS cadence: 00Z, 06Z, 12Z, 18Z), covering a full day of cycling. This tests the full data assimilation cycling loop — analysis → forecast → analysis — multiple times.

2. Extended Downstream Product Generation

The standard case uses gfs_metp_ci.yaml which only enables METplus verification (DO_METP: YES). The extended case uses gfs_extended_ci.yaml which turns on the full suite of downstream post-processing products:

  • DO_GOES: YES — GOES satellite simulated imagery
  • DO_BUFRSND: YES — BUFR sounding products
  • DO_GEMPAK: YES — GEMPAK graphical products (the JGFS_ATMOS_GEMPAK* jobs)
  • DO_AWIPS: YES — AWIPS products for NWS forecasters
  • DO_NPOESS: YES — NPOESS special GRIB2 products (JGFS_ATMOS_PGRB2_SPEC_NPOESS)
  • DO_METP: YES — METplus verification
  • DO_ARCHCOM: YES — COM archive
  • FHMAX_GFS: 384 — 16-day forecast (vs default which is shorter)
  • FHMAX_HF_GFS: 120 — High-frequency output for first 5 days
  • FCST_BREAKPOINTS: 192 — Forecast segmentation at hour 192

It also bumps taskthrottle from 25 to 75 to handle the much larger number of concurrent jobs.

Platform Restrictions

The skip_ci_on_hosts list includes hera, ursa, gaeac6, orion, hercules, and AWS — basically everywhere except WCOSS2 (the production machine). This is a heavyweight test that exercises the near-complete operational pipeline, so it only runs on the production-class platform. The standard case is the opposite — it skips WCOSS2 and AWS but runs on the research machines.

In short: "extended" = more cycles + more products + longer forecast = closer to what actually runs in NOAA operations.


MCP Tool Call Analysis

The following table records every MCP tool call made during this investigation, the response quality, and how useful each call was in arriving at the final answer.

Scoring Key

  • 5/5 — Directly contributed critical information to the answer
  • 4/5 — Provided useful supporting context
  • 3/5 — Provided relevant background but not directly used
  • 2/5 — Returned results but low relevance to the question
  • 1/5 — No useful results returned

Tool Call Log

# MCP Tool Arguments Response Summary Score Notes
1 get_knowledge_base_status include_graph=true, include_vector=true 85,995 docs across 6 collections, 2,758 files in Neo4j, 2.6M relationships, 314 shell scripts, 2,724 env vars 4/5 Confirmed the knowledge base was healthy and available before starting the investigation. Essential sanity check.
2 get_server_info include_capabilities=true Server v3.6.2, 51 tools across 8 categories, RAG enabled, GitHub enabled 4/5 Verified full tool availability. Confirmed we had semantic search, code analysis, and job detail tools ready.
3 mcp_health_check deep=true, detailed=true HEALTHY 7/7 components, sample query returned 10 results, all collections validated 4/5 Deep validation confirmed ChromaDB queries were working. Important for trusting subsequent search results.
4 explain_with_context topic="CI test cases extended vs standard atm3DVar", detail_level="advanced" Returned minimal summary — no substantive content 1/5 The hybrid search did not find specific content about "extended" CI cases. The topic was too specific for the general documentation corpus.
5 search_documentation collection="global-workflow-docs-v8-0-0", query="gfs_extended_ci.yaml CI workflow configuration" 10 results (37-48% similarity). Best hits: Contributing docs explaining CI test framework, generate_workflows.sh usage, CTest configuration 3/5 Did not find the specific file, but the Contributing docs explained the CI framework structure (dev/ci/cases/pr/ directory, YAML format, Rocoto workflow). This provided the conceptual framework for understanding the case.
6 search_documentation query="C96 atm3DVar extended CI test case" 10 results (59-61% similarity). Hits were mostly CICE testing docs and UFS regression testing, not global-workflow CI 2/5 The semantic search matched on "test case" and "extended" but returned CICE and UFS testing docs rather than global-workflow CI specifics. The query was too generic for the vector space.
7 find_env_dependencies variable_name="FHMAX_GFS" GGSR analysis found 10 dependent scripts: exglobal_cleanup.sh, exgfs_atmos_grib2_special_npoess.sh, exglobal_extractvars.sh, parsing_ufs_configure.sh, plus 5 GEMPAK jobs. Also found 5 config files that SET this variable. 5/5 Critical finding. Confirmed that FHMAX_GFS (set to 384 in the extended config) directly controls forecast length and is consumed by NPOESS, GEMPAK, cleanup, and extract scripts — exactly the downstream products the extended case enables. Connected the config setting to actual job execution.
8 search_documentation query="FHMAX_GFS forecast hours extended DO_GOES DO_BUFRSND DO_GEMPAK DO_AWIPS DO_NPOESS post-processing products" 8 results (45-47% similarity). Mostly EE2 file naming standards and GFS Configuration overview 3/5 The GFS Configuration doc confirmed the operational pipeline structure (analysis → forecast → post-processing → verification) and that downstream jobs like AWIPS and GEMPAK are "not normally run in developmental tests" — which is exactly why the extended case is special.
9 list_job_scripts search="GEMPAK" Found 7 GEMPAK jobs: JGDAS_ATMOS_GEMPAK, JGDAS_ATMOS_GEMPAK_META_NCDC, JGFS_ATMOS_GEMPAK, JGFS_ATMOS_GEMPAK_META, JGFS_ATMOS_GEMPAK_NCDC_UPAPGIF, JGFS_ATMOS_GEMPAK_PGRB2_SPEC, JGLOBAL_WAVE_GEMPAK 4/5 Confirmed the scope of GEMPAK jobs that get activated by DO_GEMPAK: YES. Showed this is a substantial set of 7 jobs, explaining why taskthrottle needs to increase from 25 to 75.
10 get_job_details job_name="JGFS_ATMOS_PGRB2_SPEC_NPOESS" Path, 53 lines, sources jjob_standard_vars.sh and jjob_shell_setup.sh, outputs COMOUT_ATMOS_GOES, uses OMP_NUM_THREADS. ChromaDB found the TODO comment: "This job is not part of the rocoto suite" 4/5 Confirmed the NPOESS job exists and produces GOES-related output. The TODO comment about not being in the rocoto suite is interesting context — it means DO_NPOESS in the extended config may be testing a job that's still being integrated.
11 search_architecture query="GFS post-processing products GOES BUFRSND GEMPAK AWIPS NPOESS downstream jobs" 5 results but all low relevance (negative scores). Returned AQM, CMAQ, and GSI communities — not GFS post-processing 1/5 The architecture search (community summaries) did not have a community specifically about GFS post-processing products. This is a gap in the hierarchical community structure — the downstream product generation pipeline isn't well-represented as a distinct subsystem.

Summary Statistics

Metric Value
Total MCP tool calls 11
Calls scoring 4-5/5 6 (55%)
Calls scoring 3/5 2 (18%)
Calls scoring 1-2/5 3 (27%)
Average score 3.2 / 5.0

Non-MCP Sources Used

In addition to MCP tools, the following direct file reads were essential:

Source Method Contribution
C96_atm3DVar_extended.yaml User-provided in chat The primary artifact being analyzed
C96_atm3DVar.yaml readFile (Kiro native) Side-by-side comparison revealed the key differences: edate, interval, yaml reference, skip_ci_on_hosts, taskthrottle
gfs_extended_ci.yaml readFile (Kiro native) The referenced config showing all the DO_* flags and FHMAX_GFS: 384
gfs_metp_ci.yaml readFile (Kiro native) The standard config showing only DO_METP: YES — the minimal baseline
C48_S2SW_extended.yaml readFile (Kiro native) Confirmed the "extended" pattern is consistent across case types (same gfs_extended_ci.yaml, same skip list)
CI cases directory listing listDirectory (Kiro native) Showed the full set of 21 CI cases, revealing only 2 have "extended" variants
3 Copilot instruction files readFile (Kiro native) Provided domain context: system architecture, job-to-script patterns, CI framework docs

Analysis: What Worked and What Didn't

What Worked Well

  1. find_env_dependencies (Call #7) was the star performer. By tracing FHMAX_GFS through the Neo4j graph, it connected the config setting to actual execution scripts and jobs, proving the extended case activates NPOESS, GEMPAK, and other downstream processing.

  2. get_job_details (Call #10) provided concrete evidence about the NPOESS job structure, confirming it's a real post-processing job with specific outputs.

  3. list_job_scripts (Call #9) quantified the GEMPAK job family (7 jobs), explaining the taskthrottle increase.

  4. Health/status tools (Calls #1-3) were essential for confirming the system was operational before relying on search results.

What Didn't Work Well

  1. explain_with_context (Call #4) returned essentially nothing. The topic was too specific for the hybrid search to find relevant content.

  2. search_architecture (Call #11) returned irrelevant communities. The GFS post-processing pipeline isn't represented as a distinct community in the hierarchical graph — this is a coverage gap worth addressing in future ingestion.

  3. search_documentation (Calls #5, #6) returned moderate-relevance results. The CI case YAML files themselves aren't in the ChromaDB corpus (they're config files, not documentation), so semantic search could only find surrounding documentation about the CI framework.

Key Insight for AWS Port

The most valuable MCP tools for this type of question were the graph-based tools (find_env_dependencies, get_job_details, list_job_scripts) rather than the semantic search tools (search_documentation, explain_with_context, search_architecture). This suggests that when porting to AWS:

  • Neptune (graph DB) will be the primary value driver for code understanding queries
  • OpenSearch (vector DB) will be more useful for documentation-style questions
  • The hybrid approach (graph + vector) remains important, but graph tools should be prioritized for code-structural questions

This report serves as a baseline quality benchmark for the MCP/RAG system. When the system is ported to AWS (Bedrock Knowledge Base + OpenSearch + Neptune), this same question should be re-run to compare answer quality, tool effectiveness, and response latency.

Generated by Kiro IDE using eib-mcp-gateway MCP tools on March 20, 2026.