MCP RAG Platform 32 Day Achievement Synopsis - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki

MCP-RAG Platform: 32-Day Achievement Synopsis

Period: January 30 – March 3, 2026
Versions: v7.10.0 → v7.25.1 (16 releases)
SDD Sessions Completed: 14 (0 abandoned)


Five Major Breakthroughs

1. Hierarchical GraphRAG Community Structure — From Flat Labels to a 4-Level Navigable Knowledge Graph

(Phases 24E-5, 24E-6 — v7.20.0, v7.22.0, v7.23.0)

The Neo4j graph went from zero Community nodes to 1,036 materialized communities across 4 hierarchical levels (L0: 694, L1: 175, L2: 86, L3: 81) with 21,559 MEMBER_OF, 978 PARENT_OF, and 1,297 INTERACTS_WITH relationships. A three-script offline pipeline then replaced all 828 template-based keyword summaries with LLM-generated narrative summaries via GitHub Models API using 10-model rotation (gpt-4o-mini, gpt-4.1, DeepSeek-R1, Llama-3.1-405B, and others). Queries like "How does data assimilation interact with the forecast model?" now traverse the full community hierarchy instead of relying on text similarity alone.

2. Cross-Language Graph Integration — Shell, Fortran, and Python Unified in a Single Traversable Graph

(Phases 24F, 24I, 27F-G, 27H, 27I, 27J — v7.15.0 through v7.19.0)

Five interconnected SDD phases eliminated the most critical blind spot in the knowledge graph: language boundaries. 383 ShellScript nodes were ingested (89 J-Jobs, 130 ex-scripts, 164 ush), 624 PythonModules with 3,267 functions were linked via 67 INVOKES edges, and Shell→Fortran EXECUTES bridges grew from 3 to 48 (16× improvement). Duplicate node cleanup (383→264 scripts), external Fortran placeholder resolution (11 NCEPLIBS/GSI/UFS_UTILS programs), and a new trace_full_execution_chain MCP tool now enable end-to-end traces from a J-Job through ex-scripts into Fortran executables — the first time an AI agent can follow the complete operational execution path across all three languages.

3. Agentic MCP Tool Surface — 5 Purpose-Built GraphRAG Tools + Session State Tracking

(Phases 24H, 24H-3, 31 — v7.11.0, v7.14.0, v7.21.0)

The platform's tool count grew from 39 to 44 production tools with five new GraphRAG-native instruments: get_code_context (single-call full neighborhood + community + callers), search_architecture (community-level semantic search), find_similar_code, get_change_impact (blast-radius with risk scoring), and trace_data_flow. Phase 24H-3 added 4 session state tools (mark_as_modified, get_session_context, checkpoint_state, restore_checkpoint) enabling agents to track file modifications, examined symbols, and create recovery checkpoints across long-running refactoring sessions. Phase 31 replaced the dormant ISD approval model with a lightweight session-oriented tracking system persisted to filesystem (active_session.json + history.jsonl).

4. NCEPLIBS Documentation Ingestion + GraphRAG Integration Specification

(Phases 34, v7.24.0, v7.25.0)

The RAG knowledge base expanded by 10 NCEPLIBS Doxygen documentation sources (bufr, ip, w3emc, g2, bacio, g2tmpl, nemsio, sfcio, sigio, wgrib2) with a new Doxygen-aware content extraction pipeline that strips boilerplate before chunking. Total enabled sources grew from 15 to 25. A comprehensive SDD specification (Phase 34) was authored to close the complete NCEPLIBS graph gap: today 214 Library nodes are all internal GW targets with zero ExternalLibrary representation — the spec defines 4 phases (34A-D) to ingest 11 NCEPLIBS repos (~5-8K new nodes), create ExternalLibrary/PlatformVersion node types, bridge USE edges to their providing libraries, and link 1,747 Doxygen docs to Neo4j subroutine nodes.

5. Instruction File Architecture + Tool Parameter Synchronization — AI Agents That Read the Right Docs

(Phases 29, 32 — v7.20.1, v7.20.2)

A systematic audit revealed that 8+ tool parameters had drifted between source code and instruction files, causing must have required property errors for AI agents. Phase 29 added backward-compatible parameter aliases across 8 tools in 4 modules, created an auto-documentation script (generate-tool-docs.js --check) that validates instruction files against live schemas, and expanded the Quick Reference table from 25 to 33 tools. Phase 32 formalized a 5-file instruction architecture across 2 repositories with conditional loading — copilot-instructions.md loads always while *.instructions.md files load only when the MCP server is connected, achieving ~35% context window reduction for non-MCP work. This closed the gap between what the platform can do and what AI agents know it can do.


By the Numbers

Metric 32 Days Ago Today Change
MCP Tools 39 44 +5
Neo4j Relationships ~485K 589,396 +104K
Community Nodes 0 1,036 +1,036
Community Summaries 63 (flat) 828 (hierarchical, LLM) +765
ChromaDB Documents ~60K 66,552 +6.5K
Shell→Fortran EXECUTES 3 48 16×
ShellScript Nodes 0 264 +264
SDD Sessions (period) 14 completed 0 abandoned
Changelog Versions 16 releases v7.10.0→v7.25.1