MCP RAG Baseline Test_C96_atm3DVar_extended_Analysis - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki

MCP/RAG Baseline Test: C96_atm3DVar_extended CI Case Analysis

Date: March 20, 2026 Platform: Kiro IDE on EC2 (aarch64) → eib-mcp-gateway v2.0.1 (Docker MCP Gateway) → global-workflow-unified-mcp v3.6.2 MCP Server: 51 tools, ChromaDB (85,995 docs / 6 collections), Neo4j (2,758 files / 2.6M relationships) Purpose: Baseline quality test for porting the MCP/RAG system to AWS Bedrock ecosystem

Question Asked

What is the C96_atm3DVar_extended.yaml CI case about and why is it called "extended"?

Answer Produced

This CI case is a cycled atmosphere-only GFS experiment using 3D-Var data assimilation at C96 resolution (~100km). It is called "extended" for two distinct reasons:

1. Extended Cycling (More DA Cycles)

	Standard (`C96_atm3DVar`)	Extended (`C96_atm3DVar_extended`)
`idate`	2021122018	2021122018
`edate`	2021122106	2021122118
`interval`	24 hours	6 hours
Cycles run	~1 (24h gap)	~4 (6h intervals over 24h)

The standard case runs roughly one cycle with a 24-hour interval. The extended case runs 4 cycles at 6-hour intervals (the operational GFS cadence: 00Z, 06Z, 12Z, 18Z), covering a full day of cycling. This tests the full data assimilation cycling loop — analysis → forecast → analysis — multiple times.

2. Extended Downstream Product Generation

The standard case uses gfs_metp_ci.yaml which only enables METplus verification (DO_METP: YES). The extended case uses gfs_extended_ci.yaml which turns on the full suite of downstream post-processing products:

DO_GOES: YES — GOES satellite simulated imagery
DO_BUFRSND: YES — BUFR sounding products
DO_GEMPAK: YES — GEMPAK graphical products (the JGFS_ATMOS_GEMPAK* jobs)
DO_AWIPS: YES — AWIPS products for NWS forecasters
DO_NPOESS: YES — NPOESS special GRIB2 products (JGFS_ATMOS_PGRB2_SPEC_NPOESS)
DO_METP: YES — METplus verification
DO_ARCHCOM: YES — COM archive
FHMAX_GFS: 384 — 16-day forecast (vs default which is shorter)
FHMAX_HF_GFS: 120 — High-frequency output for first 5 days
FCST_BREAKPOINTS: 192 — Forecast segmentation at hour 192

It also bumps taskthrottle from 25 to 75 to handle the much larger number of concurrent jobs.

Platform Restrictions

The skip_ci_on_hosts list includes hera, ursa, gaeac6, orion, hercules, and AWS — basically everywhere except WCOSS2 (the production machine). This is a heavyweight test that exercises the near-complete operational pipeline, so it only runs on the production-class platform. The standard case is the opposite — it skips WCOSS2 and AWS but runs on the research machines.

In short: "extended" = more cycles + more products + longer forecast = closer to what actually runs in NOAA operations.

MCP Tool Call Analysis

The following table records every MCP tool call made during this investigation, the response quality, and how useful each call was in arriving at the final answer.

Scoring Key

5/5 — Directly contributed critical information to the answer
4/5 — Provided useful supporting context
3/5 — Provided relevant background but not directly used
2/5 — Returned results but low relevance to the question
1/5 — No useful results returned

Tool Call Log

#	MCP Tool	Arguments	Response Summary	Score	Notes
1	`get_knowledge_base_status`	`include_graph=true, include_vector=true`	85,995 docs across 6 collections, 2,758 files in Neo4j, 2.6M relationships, 314 shell scripts, 2,724 env vars	4/5	Confirmed the knowledge base was healthy and available before starting the investigation. Essential sanity check.
2	`get_server_info`	`include_capabilities=true`	Server v3.6.2, 51 tools across 8 categories, RAG enabled, GitHub enabled	4/5	Verified full tool availability. Confirmed we had semantic search, code analysis, and job detail tools ready.
3	`mcp_health_check`	`deep=true, detailed=true`	HEALTHY 7/7 components, sample query returned 10 results, all collections validated	4/5	Deep validation confirmed ChromaDB queries were working. Important for trusting subsequent search results.
4	`explain_with_context`	`topic="CI test cases extended vs standard atm3DVar", detail_level="advanced"`	Returned minimal summary — no substantive content	1/5	The hybrid search did not find specific content about "extended" CI cases. The topic was too specific for the general documentation corpus.
5	`search_documentation`	`collection="global-workflow-docs-v8-0-0", query="gfs_extended_ci.yaml CI workflow configuration"`	10 results (37-48% similarity). Best hits: Contributing docs explaining CI test framework, `generate_workflows.sh` usage, CTest configuration	3/5	Did not find the specific file, but the Contributing docs explained the CI framework structure (`dev/ci/cases/pr/` directory, YAML format, Rocoto workflow). This provided the conceptual framework for understanding the case.
6	`search_documentation`	`query="C96 atm3DVar extended CI test case"`	10 results (59-61% similarity). Hits were mostly CICE testing docs and UFS regression testing, not global-workflow CI	2/5	The semantic search matched on "test case" and "extended" but returned CICE and UFS testing docs rather than global-workflow CI specifics. The query was too generic for the vector space.
7	`find_env_dependencies`	`variable_name="FHMAX_GFS"`	GGSR analysis found 10 dependent scripts: `exglobal_cleanup.sh`, `exgfs_atmos_grib2_special_npoess.sh`, `exglobal_extractvars.sh`, `parsing_ufs_configure.sh`, plus 5 GEMPAK jobs. Also found 5 config files that SET this variable.	5/5	Critical finding. Confirmed that `FHMAX_GFS` (set to 384 in the extended config) directly controls forecast length and is consumed by NPOESS, GEMPAK, cleanup, and extract scripts — exactly the downstream products the extended case enables. Connected the config setting to actual job execution.
8	`search_documentation`	`query="FHMAX_GFS forecast hours extended DO_GOES DO_BUFRSND DO_GEMPAK DO_AWIPS DO_NPOESS post-processing products"`	8 results (45-47% similarity). Mostly EE2 file naming standards and GFS Configuration overview	3/5	The GFS Configuration doc confirmed the operational pipeline structure (analysis → forecast → post-processing → verification) and that downstream jobs like AWIPS and GEMPAK are "not normally run in developmental tests" — which is exactly why the extended case is special.
9	`list_job_scripts`	`search="GEMPAK"`	Found 7 GEMPAK jobs: `JGDAS_ATMOS_GEMPAK`, `JGDAS_ATMOS_GEMPAK_META_NCDC`, `JGFS_ATMOS_GEMPAK`, `JGFS_ATMOS_GEMPAK_META`, `JGFS_ATMOS_GEMPAK_NCDC_UPAPGIF`, `JGFS_ATMOS_GEMPAK_PGRB2_SPEC`, `JGLOBAL_WAVE_GEMPAK`	4/5	Confirmed the scope of GEMPAK jobs that get activated by `DO_GEMPAK: YES`. Showed this is a substantial set of 7 jobs, explaining why taskthrottle needs to increase from 25 to 75.
10	`get_job_details`	`job_name="JGFS_ATMOS_PGRB2_SPEC_NPOESS"`	Path, 53 lines, sources `jjob_standard_vars.sh` and `jjob_shell_setup.sh`, outputs `COMOUT_ATMOS_GOES`, uses `OMP_NUM_THREADS`. ChromaDB found the TODO comment: "This job is not part of the rocoto suite"	4/5	Confirmed the NPOESS job exists and produces GOES-related output. The TODO comment about not being in the rocoto suite is interesting context — it means `DO_NPOESS` in the extended config may be testing a job that's still being integrated.
11	`search_architecture`	`query="GFS post-processing products GOES BUFRSND GEMPAK AWIPS NPOESS downstream jobs"`	5 results but all low relevance (negative scores). Returned AQM, CMAQ, and GSI communities — not GFS post-processing	1/5	The architecture search (community summaries) did not have a community specifically about GFS post-processing products. This is a gap in the hierarchical community structure — the downstream product generation pipeline isn't well-represented as a distinct subsystem.

Summary Statistics

Metric	Value
Total MCP tool calls	11
Calls scoring 4-5/5	6 (55%)
Calls scoring 3/5	2 (18%)
Calls scoring 1-2/5	3 (27%)
Average score	3.2 / 5.0

Non-MCP Sources Used

In addition to MCP tools, the following direct file reads were essential:

Source	Method	Contribution
`C96_atm3DVar_extended.yaml`	User-provided in chat	The primary artifact being analyzed
`C96_atm3DVar.yaml`	`readFile` (Kiro native)	Side-by-side comparison revealed the key differences: `edate`, `interval`, `yaml` reference, `skip_ci_on_hosts`, `taskthrottle`
`gfs_extended_ci.yaml`	`readFile` (Kiro native)	The referenced config showing all the `DO_*` flags and `FHMAX_GFS: 384`
`gfs_metp_ci.yaml`	`readFile` (Kiro native)	The standard config showing only `DO_METP: YES` — the minimal baseline
`C48_S2SW_extended.yaml`	`readFile` (Kiro native)	Confirmed the "extended" pattern is consistent across case types (same `gfs_extended_ci.yaml`, same skip list)
CI cases directory listing	`listDirectory` (Kiro native)	Showed the full set of 21 CI cases, revealing only 2 have "extended" variants
3 Copilot instruction files	`readFile` (Kiro native)	Provided domain context: system architecture, job-to-script patterns, CI framework docs

Analysis: What Worked and What Didn't

What Worked Well

find_env_dependencies (Call #7) was the star performer. By tracing FHMAX_GFS through the Neo4j graph, it connected the config setting to actual execution scripts and jobs, proving the extended case activates NPOESS, GEMPAK, and other downstream processing.
get_job_details (Call #10) provided concrete evidence about the NPOESS job structure, confirming it's a real post-processing job with specific outputs.
list_job_scripts (Call #9) quantified the GEMPAK job family (7 jobs), explaining the taskthrottle increase.
Health/status tools (Calls #1-3) were essential for confirming the system was operational before relying on search results.

What Didn't Work Well

explain_with_context (Call #4) returned essentially nothing. The topic was too specific for the hybrid search to find relevant content.
search_architecture (Call #11) returned irrelevant communities. The GFS post-processing pipeline isn't represented as a distinct community in the hierarchical graph — this is a coverage gap worth addressing in future ingestion.
search_documentation (Calls #5, #6) returned moderate-relevance results. The CI case YAML files themselves aren't in the ChromaDB corpus (they're config files, not documentation), so semantic search could only find surrounding documentation about the CI framework.

Key Insight for AWS Port

The most valuable MCP tools for this type of question were the graph-based tools (find_env_dependencies, get_job_details, list_job_scripts) rather than the semantic search tools (search_documentation, explain_with_context, search_architecture). This suggests that when porting to AWS:

Neptune (graph DB) will be the primary value driver for code understanding queries
OpenSearch (vector DB) will be more useful for documentation-style questions
The hybrid approach (graph + vector) remains important, but graph tools should be prioritized for code-structural questions

This report serves as a baseline quality benchmark for the MCP/RAG system. When the system is ported to AWS (Bedrock Knowledge Base + OpenSearch + Neptune), this same question should be re-run to compare answer quality, tool effectiveness, and response latency.

Generated by Kiro IDE using eib-mcp-gateway MCP tools on March 20, 2026.