MCP RAG Platform 32 Day Achievement Synopsis - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki
MCP-RAG Platform: 32-Day Achievement Synopsis
Period: January 30 – March 3, 2026
Versions: v7.10.0 → v7.25.1 (16 releases)
SDD Sessions Completed: 14 (0 abandoned)
Five Major Breakthroughs
1. Hierarchical GraphRAG Community Structure — From Flat Labels to a 4-Level Navigable Knowledge Graph
(Phases 24E-5, 24E-6 — v7.20.0, v7.22.0, v7.23.0)
The Neo4j graph went from zero Community nodes to 1,036 materialized communities across 4 hierarchical levels (L0: 694, L1: 175, L2: 86, L3: 81) with 21,559 MEMBER_OF, 978 PARENT_OF, and 1,297 INTERACTS_WITH relationships. A three-script offline pipeline then replaced all 828 template-based keyword summaries with LLM-generated narrative summaries via GitHub Models API using 10-model rotation (gpt-4o-mini, gpt-4.1, DeepSeek-R1, Llama-3.1-405B, and others). Queries like "How does data assimilation interact with the forecast model?" now traverse the full community hierarchy instead of relying on text similarity alone.
2. Cross-Language Graph Integration — Shell, Fortran, and Python Unified in a Single Traversable Graph
(Phases 24F, 24I, 27F-G, 27H, 27I, 27J — v7.15.0 through v7.19.0)
Five interconnected SDD phases eliminated the most critical blind spot in the knowledge graph: language boundaries. 383 ShellScript nodes were ingested (89 J-Jobs, 130 ex-scripts, 164 ush), 624 PythonModules with 3,267 functions were linked via 67 INVOKES edges, and Shell→Fortran EXECUTES bridges grew from 3 to 48 (16× improvement). Duplicate node cleanup (383→264 scripts), external Fortran placeholder resolution (11 NCEPLIBS/GSI/UFS_UTILS programs), and a new trace_full_execution_chain MCP tool now enable end-to-end traces from a J-Job through ex-scripts into Fortran executables — the first time an AI agent can follow the complete operational execution path across all three languages.
3. Agentic MCP Tool Surface — 5 Purpose-Built GraphRAG Tools + Session State Tracking
(Phases 24H, 24H-3, 31 — v7.11.0, v7.14.0, v7.21.0)
The platform's tool count grew from 39 to 44 production tools with five new GraphRAG-native instruments: get_code_context (single-call full neighborhood + community + callers), search_architecture (community-level semantic search), find_similar_code, get_change_impact (blast-radius with risk scoring), and trace_data_flow. Phase 24H-3 added 4 session state tools (mark_as_modified, get_session_context, checkpoint_state, restore_checkpoint) enabling agents to track file modifications, examined symbols, and create recovery checkpoints across long-running refactoring sessions. Phase 31 replaced the dormant ISD approval model with a lightweight session-oriented tracking system persisted to filesystem (active_session.json + history.jsonl).
4. NCEPLIBS Documentation Ingestion + GraphRAG Integration Specification
(Phases 34, v7.24.0, v7.25.0)
The RAG knowledge base expanded by 10 NCEPLIBS Doxygen documentation sources (bufr, ip, w3emc, g2, bacio, g2tmpl, nemsio, sfcio, sigio, wgrib2) with a new Doxygen-aware content extraction pipeline that strips boilerplate before chunking. Total enabled sources grew from 15 to 25. A comprehensive SDD specification (Phase 34) was authored to close the complete NCEPLIBS graph gap: today 214 Library nodes are all internal GW targets with zero ExternalLibrary representation — the spec defines 4 phases (34A-D) to ingest 11 NCEPLIBS repos (~5-8K new nodes), create ExternalLibrary/PlatformVersion node types, bridge USE edges to their providing libraries, and link 1,747 Doxygen docs to Neo4j subroutine nodes.
5. Instruction File Architecture + Tool Parameter Synchronization — AI Agents That Read the Right Docs
(Phases 29, 32 — v7.20.1, v7.20.2)
A systematic audit revealed that 8+ tool parameters had drifted between source code and instruction files, causing must have required property errors for AI agents. Phase 29 added backward-compatible parameter aliases across 8 tools in 4 modules, created an auto-documentation script (generate-tool-docs.js --check) that validates instruction files against live schemas, and expanded the Quick Reference table from 25 to 33 tools. Phase 32 formalized a 5-file instruction architecture across 2 repositories with conditional loading — copilot-instructions.md loads always while *.instructions.md files load only when the MCP server is connected, achieving ~35% context window reduction for non-MCP work. This closed the gap between what the platform can do and what AI agents know it can do.
By the Numbers
| Metric | 32 Days Ago | Today | Change |
|---|---|---|---|
| MCP Tools | 39 | 44 | +5 |
| Neo4j Relationships | ~485K | 589,396 | +104K |
| Community Nodes | 0 | 1,036 | +1,036 |
| Community Summaries | 63 (flat) | 828 (hierarchical, LLM) | +765 |
| ChromaDB Documents | ~60K | 66,552 | +6.5K |
| Shell→Fortran EXECUTES | 3 | 48 | 16× |
| ShellScript Nodes | 0 | 264 | +264 |
| SDD Sessions (period) | — | 14 completed | 0 abandoned |
| Changelog Versions | — | 16 releases | v7.10.0→v7.25.1 |