ADVANCED_FUTURE_WORK - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki

Advanced Future Work: MCP/RAG System Evolution

Document Type: Strategic Development Roadmap
Target: Q2 2026 Funding Cycle
Status: Draft for Review
Last Updated: January 6, 2026
Authors: EIB Development Team


Executive Summary

The MCP/RAG system has achieved operational capability with 38 tools, hybrid retrieval (ChromaDB + Neo4j), and EE2 compliance validation. This document outlines the next evolutionary phase: intelligent, self-improving AI assistance that learns from operational history, understands visual system representations, and provides truly graph-aware semantic reasoning.

Three transformational initiatives are proposed:

Initiative Impact Complexity Timeline
Multi-Modal Visual Understanding High Medium Q2 2026
Self-Learning from CI/CD History Very High High Q2-Q3 2026
True GraphRAG Fusion Transformational High Q3 2026

Estimated Team Requirement: 3-4 FTEs + LLM fine-tuning expertise
Infrastructure: Existing + GPU compute for training


1. Multi-Modal Visual Understanding

1.1 Problem Statement

The Global Workflow system has extensive visual documentation:

  • Rocoto XML DAG flowcharts showing job dependencies
  • System architecture diagrams maintained by SMEs
  • UML-style component diagrams for subsystems (GSI, UFS, UPP)
  • Operational runbooks with embedded decision flowcharts

Current RAG systems treat these as opaque images or skip them entirely. Yet these diagrams often contain the most accurate, up-to-date representation of system relationships.

1.2 Proposed Solution

Leverage multi-modal LLM capabilities (GPT-4V, Claude 3.5 Sonnet, Gemini Pro Vision) to:

  1. Index Visual Content: Extract semantic understanding from flowcharts during ingestion
  2. Runtime Visual Queries: Allow operators to ask "What does this diagram show?" with image input
  3. Cross-Reference: Link visual elements to code entities in Neo4j graph

1.3 Technical Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Multi-Modal Ingestion Pipeline               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Flowchart   β”‚ ──▢│ Vision LLM   │───▢│ Structured JSON  β”‚  β”‚
β”‚  β”‚  PNG/SVG     β”‚     β”‚ Extraction   β”‚    β”‚ (nodes, edges)   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                    β”‚            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚            β”‚
β”‚  β”‚  Rocoto XML  │──> β”‚ DAG Parser   │─────────────>─            β”‚
β”‚  β”‚  Workflow    β”‚    β”‚              β”‚              β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β–Ό            β”‚
β”‚                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚                                          β”‚   Neo4j Graph    β”‚   β”‚
β”‚                                          β”‚   (visual_node)  β”‚   β”‚
β”‚                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                    β”‚            β”‚
β”‚                                                    β–Ό            β”‚
β”‚                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚                                          β”‚   ChromaDB       β”‚   β”‚
β”‚                                          β”‚ (visual_context) β”‚   β”‚
β”‚                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1.4 Key Deliverables

  • Visual content ingestion pipeline for PNG, SVG, PDF diagrams
  • Neo4j schema extension: (:VisualElement)-[:DEPICTS]->(:CodeEntity)
  • New MCP tool: analyze_visual_diagram
  • ChromaDB collection: global-workflow-visuals-v1
  • Integration with existing SME-maintained flowcharts

1.5 Success Metrics

Metric Target
Diagram coverage 80% of documented flowcharts indexed
Visual query accuracy 85% correct entity extraction
Cross-reference linkage 70% visual elements linked to code

2. Self-Learning from CI/CD History

2.1 Vision

Transform the MCP/RAG system from a static knowledge retrieval system to a continuously learning system that improves from operational experience.

2.2 The Learning Loop

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Self-Learning Training Pipeline                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚   β”‚  CI/CD      β”‚         β”‚  Git Log    β”‚         β”‚  GitHub PR  β”‚       β”‚
β”‚   β”‚  Error Logs β”‚         β”‚  History    β”‚         β”‚  Messages   β”‚       β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚          β”‚                       β”‚                       β”‚              β”‚
β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                                  β–Ό                                      β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚                    β”‚   Training Data Generator   β”‚                      β”‚
β”‚                    β”‚   (Error β†’ Solution Pairs)  β”‚                      β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β”‚                                   β”‚                                     β”‚
β”‚                                   β–Ό                                     β”‚
β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚          β”‚         MCP/RAG Attempt Resolution         β”‚                 β”‚
β”‚          β”‚    (Generate candidate fix using RAG)      β”‚                 β”‚
β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚                                   β”‚                                     β”‚
β”‚                                   β–Ό                                     β”‚
β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚          β”‚           Validation Against               β”‚                 β”‚
β”‚          β”‚        Actual Git Commit/PR Fix            β”‚                 β”‚
β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚                                   β”‚                                     β”‚
β”‚                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚                   β–Ό                               β–Ό                     β”‚
β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚          β”‚   Correct    β”‚                β”‚  Incorrect   β”‚               β”‚
β”‚          β”‚  (Reinforce) β”‚                β”‚   (Learn)    β”‚               β”‚
β”‚          β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                 β”‚                               β”‚                       β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚                                 β–Ό                                       β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚                    β”‚   Fine-Tuning Dataset       β”‚                      β”‚
β”‚                    β”‚   (RLHF / DPO Format)       β”‚                      β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β”‚                                   β”‚                                     β”‚
β”‚                                   β–Ό                                     β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚                    β”‚   Domain-Adapted Weights    β”‚                      β”‚
β”‚                    β”‚   (LoRA / QLoRA Adapter)    β”‚                      β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2.3 Data Sources

Source Volume Content
Jenkins CI logs ~50,000 builds/year Build failures, test errors, deployment issues
GitHub Actions ~10,000 runs/year Workflow failures, linting errors
Git commit history ~5,000 commits/year Fixes, refactors, bug patches
GitHub PR discussions ~800 PRs/year Problem descriptions, review comments, solutions
Jira/Issue trackers ~1,200 tickets/year Bug reports, resolution notes

2.4 Training Data Format

{
  "error_context": {
    "log_snippet": "FATAL: exglobal_atmos_analysis.py line 342: KeyError 'LEVS'",
    "job_name": "gfs_atmos_analysis_f000",
    "platform": "WCOSS2",
    "timestamp": "2025-08-15T06:32:00Z"
  },
  "rag_attempt": {
    "retrieved_docs": ["config_guide.md#environment-vars", "LEVS_param.rst"],
    "generated_fix": "Add LEVS=${LEVS:-127} to job card preamble",
    "confidence": 0.72
  },
  "actual_solution": {
    "commit_sha": "a1b2c3d4",
    "pr_number": 1247,
    "fix_description": "Export LEVS from parent script before job submission",
    "files_changed": ["scripts/exglobal_atmos_analysis.sh"]
  },
  "evaluation": {
    "rag_correct": false,
    "error_category": "environment_variable_propagation",
    "lesson": "LEVS must be exported, not just set, for subprocess inheritance"
  }
}

2.5 Fine-Tuning Strategy

Phase 1: Supervised Fine-Tuning (SFT)

  • Create instruction-following dataset from correct solutions
  • Fine-tune base model (Llama 3, Mistral, or domain-specific)
  • Target: Improve first-attempt accuracy on common error patterns

Phase 2: Reinforcement Learning from Human Feedback (RLHF)

  • Collect SME preferences on generated solutions
  • Train reward model on preference pairs
  • PPO optimization for solution quality

Phase 3: Direct Preference Optimization (DPO)

  • Simpler alternative to full RLHF
  • Use (RAG attempt, actual fix) pairs as preference data
  • Lower compute requirements, faster iteration

2.6 Deployment Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Inference Stack                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  Base LLM       β”‚     β”‚  Domain LoRA Adapter    β”‚    β”‚
β”‚  β”‚  (Claude/GPT)   β”‚ <── β”‚  (GFS/GEFS Expertise)   β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚           β”‚                                             β”‚
β”‚           β–Ό                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚              MCP/RAG Hybrid Layer               β”‚    β”‚
β”‚  β”‚  ChromaDB (semantic) + Neo4j (graph) + Adapter  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2.7 Key Deliverables

  • CI/CD log parser and error categorizer
  • Git history analyzer (commit ↔ issue linkage)
  • Training data generation pipeline
  • Fine-tuning infrastructure (GPU cluster access)
  • LoRA adapter for GFS/GEFS domain
  • A/B testing framework for model comparison
  • Continuous learning pipeline (monthly retraining)

2.8 Success Metrics

Metric Baseline Target
First-attempt fix accuracy 35% 65%
Time to resolution (assisted) 4.2 hours 1.5 hours
SME satisfaction score 3.2/5 4.5/5
Novel error pattern recognition 0% 40%

3. True GraphRAG Fusion

3.1 Current Limitation

Today's hybrid retrieval is parallel but disconnected:

Query β†’ ChromaDB (semantic similarity) β†’ Results A
Query β†’ Neo4j (graph traversal) β†’ Results B
Merge A + B β†’ Final Results

The graph structure does NOT inform the semantic search, and vice versa.

3.2 GraphRAG Vision

Graph-Informed Semantic Search: Use relationship topology as a retrieval dimension.

Query: "How do I fix the sfcanl job when it fails on missing sea ice data?"

Traditional RAG:
  β†’ Searches for "sfcanl" + "sea ice" + "missing" in vector space
  β†’ May miss: seaice_analysis.py, prep_seaice.sh, ice_blend.F90

GraphRAG:
  β†’ Finds sfcanl job node in Neo4j
  β†’ Traverses: sfcanl -[DEPENDS_ON]-> prep_seaice -[CALLS]-> ice_blend
  β†’ Expands semantic search to include 2-hop neighborhood
  β†’ Retrieves documentation for ALL related components
  β†’ Ranks by: semantic_score Γ— graph_proximity_score

3.3 Technical Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         GraphRAG Retrieval Engine                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚                        Query Understanding                        β”‚ β”‚
β”‚  β”‚  "Fix sfcanl sea ice failure" β†’ entities: [sfcanl, sea_ice]       β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                    β”‚                                   β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚                    β–Ό                               β–Ό                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚       Neo4j Expansion       β”‚   β”‚    ChromaDB Base Search     β”‚     β”‚
β”‚  β”‚                             β”‚   β”‚                             β”‚     β”‚
β”‚  β”‚  MATCH (n {name:'sfcanl'})  β”‚   β”‚  similarity_search(         β”‚     β”‚
β”‚  β”‚  -[*1..3]-(related)         β”‚   β”‚    "sfcanl sea ice failure" β”‚     β”‚
β”‚  β”‚  RETURN related.name,       β”‚   β”‚  )                          β”‚     β”‚
β”‚  β”‚         relationship_type,  β”‚   β”‚                             β”‚     β”‚
β”‚  β”‚         path_length         β”‚   β”‚  β†’ [doc1, doc2, doc3, ...]  β”‚     β”‚
β”‚  β”‚                             β”‚   β”‚                             β”‚     β”‚
β”‚  β”‚  β†’ [prep_seaice: 1 hop,     β”‚   β”‚                             β”‚     β”‚
β”‚  β”‚     ice_blend: 2 hops,      β”‚   β”‚                             β”‚     β”‚
β”‚  β”‚     CICE_config: 2 hops]    β”‚   β”‚                             β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                 β”‚                                  β”‚                   β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”‚                                    β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚                    Graph-Aware Reranking                          β”‚ β”‚
β”‚  β”‚                                                                   β”‚ β”‚
β”‚  β”‚  final_score = Ξ± Γ— semantic_sim + Ξ² Γ— (1 / graph_distance) +      β”‚ β”‚
β”‚  β”‚                Ξ³ Γ— relationship_weight                            β”‚ β”‚
β”‚  β”‚                                                                   β”‚ β”‚
β”‚  β”‚  Where:                                                           β”‚ β”‚
β”‚  β”‚    - semantic_sim: cosine similarity from ChromaDB                β”‚ β”‚
β”‚  β”‚    - graph_distance: shortest path length in Neo4j                β”‚ β”‚
β”‚  β”‚    - relationship_weight: CALLS > DEPENDS_ON > REFERENCES         β”‚ β”‚
β”‚  β”‚    - Ξ±, Ξ², Ξ³: learned or tuned weights                            β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                    β”‚                                   β”‚
β”‚                                    β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚                     Augmented Context Window                      β”‚ β”‚
β”‚  β”‚                                                                   β”‚ β”‚
β”‚  β”‚  [High relevance]  sfcanl_job_card.rst (0.92)                     β”‚ β”‚
β”‚  β”‚  [Graph neighbor]  prep_seaice.sh (0.78 + 1-hop bonus)            β”‚ β”‚
β”‚  β”‚  [Graph neighbor]  ice_blend.F90 (0.65 + 2-hop bonus)             β”‚ β”‚
β”‚  β”‚  [Semantic only]   seaice_overview.md (0.71)                      β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3.4 Implementation Phases

Phase 1: Entity Linking (Q2 2026)

  • Extract named entities from queries (job names, file names, functions)
  • Link to Neo4j nodes
  • Expand search scope to N-hop neighbors

Phase 2: Relationship-Weighted Scoring (Q2-Q3 2026)

  • Assign weights to relationship types (CALLS, IMPORTS, DEPENDS_ON)
  • Incorporate path length into reranking
  • A/B test against baseline hybrid retrieval

Phase 3: Learned Graph Embeddings (Q3 2026)

  • Train graph neural network on Neo4j structure
  • Generate node embeddings that capture topological position
  • Fuse with text embeddings in joint vector space

Phase 4: Subgraph Retrieval (Q4 2026)

  • Return relevant subgraphs, not just nodes
  • Visualize dependency chains in responses
  • "Here's why sfcanl depends on sea ice processing: [interactive graph]"

3.5 Key Deliverables

  • Entity extraction and Neo4j linking module
  • Graph-aware reranking algorithm
  • Relationship weight tuning framework
  • Graph embedding training pipeline (optional Phase 3)
  • Updated MCP tools with GraphRAG backend
  • Evaluation benchmark: GraphRAG vs baseline

3.6 Success Metrics

Metric Baseline (Hybrid) Target (GraphRAG)
Recall@10 for related code 62% 85%
Cross-component issue resolution 45% 75%
"Missing context" user complaints 23% of queries <10%
Avg. relevant docs per query 3.2 6.5

4. Additional Strategic Enhancements

4.1 Temporal Awareness

Goal: Understand time-sensitive aspects of weather operations.

  • Index by model cycle (00Z, 06Z, 12Z, 18Z)
  • Track documentation freshness (stale content warnings)
  • Version-aware retrieval ("Show me the GFS v16.3 config, not v17")

4.2 Confidence Calibration

Goal: Provide uncertainty quantification with every response.

Response: "Set FHMAX=384 for extended forecast"
Confidence: HIGH (0.94) - Found in 3 authoritative sources
Source agreement: 3/3 sources consistent
Last verified: 2025-12-15

4.3 Proactive Anomaly Detection

Goal: Warn before changes cause downstream issues.

  • Monitor code changes against Neo4j dependency graph
  • Alert: "Modifying exglobal_forecast.py impacts 12 downstream jobs"
  • Integration with PR review process

4.4 Execution Memory

Goal: Learn from past workflow executions.

  • Capture SDD workflow outcomes (success/failure/partial)
  • Build "lessons learned" knowledge base
  • Surface: "Last time X was attempted on WCOSS2, issue Y occurred"

5. Resource Requirements

5.1 Team Composition

Role FTE Responsibilities
ML Engineer 1.5 Fine-tuning, GraphRAG implementation
Backend Developer 1.0 Pipeline development, Neo4j/ChromaDB
DevOps/MLOps 0.5 Training infrastructure, CI/CD
Domain SME (part-time) 0.5 Validation, feedback, training data QA
Total 3.5 FTE

5.2 Infrastructure

Resource Specification Est. Cost/Month
GPU Compute (training) 4x A100 80GB $8,000
GPU Compute (inference) 2x A10G $2,000
Storage (training data) 2TB NVMe $200
Neo4j Enterprise (optional) 32GB RAM cluster $1,500
ChromaDB (current) Existing $0
Total ~$11,700/month

5.3 Timeline

Q2 2026
β”œβ”€β”€ Month 1: Multi-modal pipeline, CI/CD log parser
β”œβ”€β”€ Month 2: Visual content indexing, training data generation
└── Month 3: GraphRAG Phase 1 (entity linking)

Q3 2026
β”œβ”€β”€ Month 4: Fine-tuning infrastructure, first LoRA adapter
β”œβ”€β”€ Month 5: GraphRAG Phase 2 (relationship scoring)
└── Month 6: A/B testing, evaluation benchmarks

Q4 2026
β”œβ”€β”€ Month 7: RLHF/DPO training cycles
β”œβ”€β”€ Month 8: GraphRAG Phase 3 (learned embeddings)
└── Month 9: Production deployment, documentation

6. Risk Assessment

Risk Likelihood Impact Mitigation
Insufficient training data quality Medium High SME review cycles, data augmentation
GPU compute availability Medium Medium Cloud burst capacity, spot instances
Model hallucination on novel errors High Medium Confidence thresholds, human-in-loop
Neo4j performance at scale Low Medium Query optimization, caching
Scope creep High Medium Strict phase gates, MVP focus

7. Success Criteria for Funding Approval

7.1 Minimum Viable Outcomes (Must Achieve)

  • Visual diagram ingestion operational for top 20 flowcharts
  • Training data pipeline generating 1000+ errorβ†’solution pairs
  • GraphRAG entity linking showing measurable recall improvement
  • One fine-tuned adapter deployed for evaluation

7.2 Stretch Goals

  • Full RLHF training cycle completed
  • GraphRAG with learned embeddings
  • Proactive anomaly detection in PR workflow
  • Sub-2-hour assisted resolution time for common errors

8. Conclusion

The MCP/RAG system has proven the value of AI-assisted weather operations. The next phase transforms it from a knowledge retrieval tool into an intelligent learning partner that:

  1. Sees what operators see (visual understanding)
  2. Learns from every resolved issue (self-improvement)
  3. Understands system relationships deeply (GraphRAG)

This is not incremental improvementβ€”it's a paradigm shift toward autonomous operational intelligence for NOAA's critical weather forecasting infrastructure.


Appendix A: References

Appendix B: Related Internal Documents


Appendix C: Multi-Modal Proof of Concept - GFS v16 Flowchart Analysis

The following analysis demonstrates the power of multi-modal AI comprehension applied to the GFS v16 Global Model Parallel Sequencing flowchart (Fig. 4.1). This is exactly the kind of visual understanding that Initiative #1 will systematize.

GFS v16 Global Model Parallel Sequencing Flowchart

Figure: GFS v16 Schematic flow chart for operations - Source: Global Workflow Documentation


High-Level Architecture Insights

Three Major Swim Lanes:

  1. GDAS (left, green) - Global Data Assimilation System
  2. Hybrid EnKF (center, light green) - Ensemble Kalman Filter
  3. GFS (right, blue) - The forecast model itself

Plus two vertical control systems:

  • Workflow Manager (yellow, right edge)
  • Configuration Manager (yellow, far right)

Job Dependency Chain (GDAS Lane)

prep β†’ waveinit β†’ waveprep β†’ fcst β†’ postN β†’ wavepostsbs β†’ wavepostpnt
         ↓
        anal β†’ analcalc β†’ analdiag
                  ↓
                gldas
                  ↓
        fcst β†’ echgres β†’ vrfy β†’ arch

Key observation: The GDAS side feeds into the Hybrid EnKF AND cycles at +06 hours.


Hybrid EnKF Data Flow

eobs β†’ ediag β†’ eupd β†’ esfc β†’ ecmn (ecenN metatask)
                 ↓
              efmn (efcsN) β†’ epmn (eposN)
                              ↓
                           eamn (earcN)

The "N" suffix = metatask (runs multiple ensemble members in parallel)


GFS Lane Dependencies

prep β†’ anal β†’ analcalc β†’ fcst β†’ wavepostsbs
         ↓                        wavepostbndpnt
      waveinit β†’ waveprep         wavepostbndpntbll
                                  wavepostpnt
                                       ↓
                                    postN β†’ vrfy β†’ arch
                                             metpN

Decision diamond: "run GFS this cycle?" - determines if additional downstream jobs execute


Critical Insights for MCP/RAG

  1. The +06 cycle hour box is the synchronization point between GDAS and GFS - this is why temporal awareness matters!

  2. Wave components (waveinit, waveprep, wavepost*) run in both GDAS and GFS - these share code paths we should cross-reference in Neo4j.

  3. The echgres job bridges forecast to verification - a common failure point.

  4. Metatasks (pink dashed boxes) like ecenN, efcsN, eposN, earcN are ensemble parallelization - errors here cascade across all members.

  5. Dashed arrows = optional dependencies - the system can degrade gracefully.


What This Means for GraphRAG

If we ingested this diagram:

  • 42 job nodes could be created in Neo4j
  • ~60 DEPENDS_ON relationships from the arrows
  • 3 BELONGS_TO_LANE relationships (GDAS, EnKF, GFS)
  • Cross-lane links where jobs feed each other

Then a query like "What happens if eupd fails?" could traverse:

eupd -[BLOCKS]-> esfc -[BLOCKS]-> ecmn -[BLOCKS]-> efmn -[BLOCKS]-> epmn

And the RAG would pull documentation for ALL affected downstream jobs, not just eupd.

This single diagram encodes more operational knowledge than 50 pages of text. That's why multi-modal ingestion is item #1 on our roadmap!


"The measure of intelligence is the ability to change." β€” Albert Einstein

This system will not just answer questions. It will learn, adapt, and evolve with the operational needs of NOAA's weather forecasting mission.