Home - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki
Welcome to the NOAA Global Workflow technical wiki. This knowledge base documents solutions, configurations, and insights for operating and developing the Global Forecast System workflow.
(March 16, 2026)
ICSDIR_ROOT-Removal-Impact-Analysis β GraphRAG-assisted analysis of safely removing ICSDIR_ROOT from CI platform configs and case YAMLs
MCP GraphRAG tools (find_env_dependencies, get_code_context, search_documentation) were used to trace ICSDIR_ROOT through the full dependency chain: 6 platform configs β 25 CI case YAMLs β create_experiment.py β setup_expt.py β config.stage_ic.j2 β parm/stage/*.yaml.j2. Analysis confirms the variable is redundant with BASE_IC (defined in dev/workflow/hosts/<platform>.yaml), as config.stage_ic.j2 already has a built-in fallback. Documents 19 safe-to-remove cases, 8 non-standard exceptions, and required CTest/unit test updates.
(March 6, 2026)
Parallel-Works-RDHPCS-Platform-Dashboard β Comprehensive inventory of the NOAA RDHPCS Hybrid Cloud: 35 clusters, $945K budget, storage across AWS/Google/Azure
Live-queried dashboard covering all compute clusters (35 total, 7 owned by Terry), cost & budget analysis ($91.6K of $945K spent across 8 groups), storage inventory (33 resources: buckets, NFS, Lustre, disks), networking (4 VPCs, 1 static IP), active sessions, ML workspaces, and platform configuration. Data collected automatically via 26 PW MCP tool queries against the Parallel Works REST API.
(March 6, 2026)
PW-MCP-Toolset-Documentation β Complete reference for the Parallel Works MCP Server: 26 tools across 7 categories
Documents every tool in the parallel-works-mcp server including authentication, compute management, cost analysis, storage (6 tools), networking (3 tools), workflows, and ML workspaces. Covers the Phase 37 SDD expansion (19 β 26 tools, 548 LOC added), API endpoint coverage map (24 endpoints), architecture diagram, and configuration guide. All tools live-tested and verified against the PW v7.15.1 API.
(March 5, 2026)
Neo4j-GraphRAG-Ingestion-Pipeline β Complete guide to how the 41K-node, 589K-relationship knowledge graph gets built from source code
Documents all 15+ ingestion scripts across 4 pipeline stages: Fortran/Python/Shell code graph creation, vector embeddings, documentation crawling, and hierarchical community detection with LLM summarization. Includes execution order, node/relationship counts, and architecture diagrams.
(March 3, 2026)
MCP-RAG-Platform-32-Day-Achievement-Synopsis β Five major breakthroughs from Jan 30 β Mar 3, 2026 (v7.10.0 β v7.25.1)
Sixteen releases spanning hierarchical GraphRAG communities (1,036 nodes, 828 LLM summaries), cross-language graph unification (ShellβFortranβPython), 5 new agentic MCP tools with session state tracking, NCEPLIBS documentation ingestion, and instruction file architecture that reduced agent context window usage by 35%. Neo4j relationships grew from ~485K to 589K; ChromaDB documents from ~60K to 66.5K; 14 SDD sessions completed with 0 abandoned.
(February 24, 2026)
Interactive SVG diagram of the full Spec-Driven Development pipeline
This visual reference documents the end-to-end development process used by the EIB MCP-RAG platform, where Spec-Driven Development (SDD) governs the lifecycle from conceptual design through autonomous code implementation and back to human review.
| Lane | Steps | Actor |
|---|---|---|
| Human + AI (IDE) | 1. Discover β 2. Spec Design β 3. Resolve Decisions β 4. Prep Handoff | Interactive (VS Code Copilot) |
| CLI Agent (--yolo) | 5. /plan Decompose β 6. Start Session β 7. Implement + Record | Autonomous (Copilot CLI) |
| Validation Gates | generate-tool-docs --check β npm test β health check β Docker rebuild β Gateway verify | Automated |
| Persistent State | active_session.json, history.jsonl, checkpoints/, workflows/ | Phase 31 SessionManager |
The diagram illustrates 8 numbered steps across swim lanes, with the CLI /plan handoff (Step 4β5) as the bridge between human-guided design and autonomous implementation. Both modalities share the same filesystem-based session state (Phase 31 model), enabling real-time monitoring from either side.
Includes embedded description with validation pipeline details, persistent state lifecycle, and modality-aware execution principles. If the inline preview is blocked by your browser, open the HTML file directly.
(February 24, 2026)
GraphRAG-Hierarchical-Community-Materialization β 4-Level Navigable Knowledge Graph of the Global Workflow
The NOAA Global Workflow now has a hierarchical community structure in Neo4j β 1,036 Community nodes across 4 levels, enabling multi-resolution understanding of how 40,000+ code entities organize into subsystems and how those subsystems interact.
| Metric | Before | After |
|---|---|---|
| Community nodes | 0 | 1,036 (L0: 694, L1: 175, L2: 86, L3: 81) |
| MEMBER_OF relationships | 0 | 21,559 |
| PARENT_OF hierarchy | 0 | 978 edges (valid acyclic tree) |
| INTERACTS_WITH edges | 0 | 1,297 cross-community links |
| Community summaries | 63 flat | 828 hierarchical (4 levels) |
Ask "How does data assimilation interact with the forecast model?" and the system traverses L3 β L2 β L1 β L0 communities, returning subsystem boundaries, interaction strengths, and member-level detail β not just text similarity matches. This is Graph-Guided Semantic Retrieval operating on a structural map of one of the most complex computational workflows on Earth.
Phase 24E-5 β Spec-Driven Development, dual-agent execution + independent verification, 6-minute implementation, 6/6 tests passing.
(January 13, 2026)
Dynamic_MCP_Server_Self_Provisioning β LLM Agents That Expand Their Own Capabilities
We have achieved a paradigm shift in agentic AI: an AI assistant that can discover, configure, and activate new tool servers autonomously through the Docker MCP Gatewayβwithout CLI commands, config files, or restarts.
When asked about coupled modeling research papers, the LLM autonomously:
| Step | MCP Tool Used | Result |
|---|---|---|
| 1. Discover | mcp-find |
Found arxiv-mcp-server in catalog |
| 2. Configure | mcp-config-set |
Set storage path |
| 3. Activate | mcp-add |
Added 4 new tools live |
| 4. Execute | mcp-exec |
Searched arXiv, returned papers |
No CLI. No config files. No restarts. Pure MCP tool orchestration.
This transforms the agent from a static tool user to a dynamic capability builderβrecognizing what it needs and acquiring those capabilities autonomously.
Gateway management tools: mcp-find, mcp-add, mcp-remove, mcp-config-set, mcp-exec, mcp-create-profile, code-mode
(January 6, 2026)
ADVANCED_FUTURE_WORK β Strategic Development Roadmap for Q2 2026 Funding Cycle
This comprehensive roadmap outlines the next evolutionary phase of the MCP/RAG system: intelligent, self-improving AI assistance that learns from operational history, understands visual system representations, and provides truly graph-aware semantic reasoning.
| Initiative | Impact | Timeline |
|---|---|---|
| Multi-Modal Visual Understanding | High | Q2 2026 |
| Self-Learning from CI/CD History | Very High | Q2-Q3 2026 |
| True GraphRAG Fusion | Transformational | Q3 2026 |
The document includes a demonstration of multi-modal AI comprehension applied to the GFS v16 Global Model Parallel Sequencing flowchart. Key insights extracted:
- 42 job nodes identified across three swim lanes (GDAS, Hybrid EnKF, GFS)
- ~60 DEPENDS_ON relationships mapped from visual arrows
-
Critical synchronization point at
+06cycle hour between GDAS and GFS -
Cascade failure analysis: A query like "What happens if
eupdfails?" could traverse the entire downstream dependency chain
This single diagram encodes more operational knowledge than 50 pages of text.
Estimated team requirement: 3-4 FTEs + LLM fine-tuning expertise
VSCODE_CODE_CLI_TUNNEL_REFERENCE β Comprehensive guide to code tunnel and related remote server commands for VS Code CLI 1.107.1.
The VS Code tunnel feature enables secure remote development through vscode.dev from anywhereβcritical for accessing HPC login nodes, cloud VMs, and CI/CD environments without traditional SSH port forwarding.
Key Capabilities:
- π Remote Tunnels - Access any machine via browser at
vscode.dev/tunnel/<name> - βοΈ System Service Mode - Persistent always-on connections with
code tunnel service install - π Authentication - GitHub/Microsoft login with token-based automation support
- π¦ Extension Management - Pre-install extensions on remote servers
- π§ Local Web Server - Run VS Code web UI locally with
code serve-web
HPC Use Case Example:
# On Hera login node
code tunnel --name hera-login --no-sleep
# Access from anywhere: https://vscode.dev/tunnel/hera-loginEssential reference for remote development workflows on RDHPCS platforms.
Machine_System_Conditionals β Comprehensive guide to all platform-specific conditionals in the Global Workflow codebase
This reference documents every location where the codebase performs conditional operations based on the HPC system or machine where the code executes. Critical for platform portability, debugging system-specific issues, and onboarding new HPC platforms.
| Category | Count |
|---|---|
| Supported Platforms | 11 (Hera, Ursa, Orion, Hercules, WCOSS2, Gaea C5/C6, AWS/Azure/Google PW, Container) |
| Shell Detection Scripts | 11+ files with MACHINE_ID conditionals |
| Python Detection |
hosts.py with Host class |
| Host YAML Configs | 11 files in workflow/hosts/
|
Key Files:
-
ush/detect_machine.shβ Primary machine detection (hostname + path-based) -
ush/module-setup.shβ Platform-specific module loading -
dev/workflow/hosts.pyβ PythonHostclass withSUPPORTED_HOSTS
Essential for developers adding new platform support or debugging platform-specific issues.
Docker_MCP_Gateway_MultiUser_Architecture β Comprehensive analysis of Docker MCP Gateway v0.35.0 architecture options for multi-user RDHPCS deployments
This document addresses the challenge of container accumulation when multiple SME developers access the MCP/RAG system via VS Code Remote Tunnels. After deep investigation of the Docker MCP Gateway source code, we discovered that container spawning per session is the intended design, not a bug.
| Option | Approach | Effort | Memory per User |
|---|---|---|---|
| A: type: remote | Gateway proxies to HTTP server | 4-6 hrs | ~200MB |
| B: Default + Cleanup | Accept container spawning, add cron | 30 min | ~2GB |
| C: Direct stdio | Skip gateway for VS Code | 0 | ~200MB |
| D: Hybrid β | stdio for VS Code, gateway for external | 4-6 hrs | ~200MB |
VS Code Sessions β Direct stdio (mcp.json) β Node.js process (~200MB)
External Clients β Gateway (type:remote) β HTTP MCP Server (:3000)
Both paths share β ChromaDB + Neo4j databases
Key Insight: VS Code Copilot works excellently with lightweight Node.js stdio processes. Reserve the Docker MCP Gateway for external HTTP clients (n8n, Claude Desktop, API consumers) where container-mediated access adds security value.
Implementation: Three-phase rollout starting with immediate gateway disabling for VS Code, followed by HTTP transport implementation for external clients.
Full analysis with source code references and implementation plan.
Docker MCP Gateway: Enabling MCP-as-a-Service for Enterprise AI Integration (PDF, 11 pages)
This comprehensive technical paper documents how Docker MCP Gateway transforms the Model Context Protocol from a single-client development tool into enterprise-ready multi-client infrastructure. The gateway bridges stdio and HTTP/SSE transports, enabling multiple AI clients (VS Code Copilot, Claude Desktop, LangFlow) to share common MCP tools simultaneously.
Key Topics Covered:
- π MCP Transport Mechanisms - Why stdio limits single-client usage and how SSE enables network access
- ποΈ Gateway Architecture - Protocol bridging, session management, and container lifecycle
- π Security Model - Network isolation, authentication, and resource limits
- π NOAA Implementation - Production deployment with 32 tools, ChromaDB (14,854 docs), Neo4j (85,894 relationships)
- π Lessons Learned - Docker CE compatibility, label format requirements, network trade-offs
π₯ Download PDF
Paper authored December 15, 2025 | NOAA EMC Global Workflow MCP Team
Breakthrough Achievement: 85% reduction in AI false positives through SME-driven semantic annotations embedded directly in technical standards documentation.
AI-generated EE2 compliance recommendations suffered from systematic false positivesβthe AI was recommending patterns not actually required by NCEP operational standards (e.g., set -eu when only set -x is mandated). Traditional approaches required code changes for every correction.
Semantic annotations are machine-readable knowledge embedded in RST documentation that teach AI systems what patterns to recommendβand what to avoid:
.. mcp:anti_pattern:: adding_set_e_or_set_eu
:severity: must_not
:context: operational_scripts
:sme_justification: Not present in EE2 standards or examples
:evidence: standards.rst lines 588-595Why This Matters for NOAA:
| Before (Phase 1) | After (Phase 2) |
|---|---|
| 328 false positive violations | 48 legitimate violations |
| Hard-coded rules in JavaScript | SME-maintained RST annotations |
| Changes required programming | Zero code changes to update rules |
| No evidence trail | Complete traceability to EE2 source |
-
PHASE_2_HYBRID_ARCHITECTURE_SPECIFICATION - Complete technical specification of the hybrid architecture that generates runtime configuration from semantic embeddings. Covers the 5-component pipeline (EE2 Standards β Annotations β ChromaDB β JSON Config β Scan Tool), validation results, and scalability analysis. Essential reading for understanding how semantic intelligence achieves runtime performance.
-
SME_Training_QuickStart - Practical 2-hour training guide for Subject Matter Experts on creating and reviewing semantic annotations. Includes linguistic framework (for translators/language experts), the 7 MCP directive types, and hands-on exercises. Enables domain experts to maintain compliance intelligence without programming.
-
SME Training QuickStart Guide (PDF) - Printable version of the training materials for offline use and in-person training sessions.
The "hybrid" pattern combines the best of two worlds:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BUILD TIME: Semantic Intelligence β
β ChromaDB embeddings + Neo4j relationships β JSON Config β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RUNTIME: Static Performance β
β Load JSON once β O(1) lookup per file β Zero DB queries β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Result: Semantic understanding WITHOUT runtime database queries. Scan 647 files in 12 seconds with full evidence traceability.
This architecture enables a new paradigm for expert-in-the-loop AI development:
- AI generates compliance recommendations using RAG-enhanced search
- SMEs review and identify false positives
- Annotations capture corrections in machine-readable form
- Pipeline regenerates configuration automatically
- AI learns without code changes
This is institutional knowledge preservationβcapturing what experts know in a form that makes AI smarter.
RAG_manifolds β Dimensional Conformality in Vector Databases: The Mathematical Foundation of RAG Embedding Spaces
A deep dive into why query and document embeddings must inhabit the same metric space for semantic search to work. On the surface, the cosine similarity formula is undergraduate linear algebra β but the 768-dimensional feature spaces encode recursive linguistic structures, emergent semantic geometry, and the holistic paradox of meaning encoded in vectors. Covers the mathematical foundations (SBERT, DPR, RAG papers), the superposition hypothesis, and the philosophical implications of meaning-as-geometry.
"The feature spaces are indeed a true enigma of recursive and holistic complexities β basic on the surface, infinitely deep upon reflection."
Successfully upgraded the RAG system from all-MiniLM-L6-v2 (384 dimensions) to all-mpnet-base-v2 (768 dimensions), delivering a 50-100% improvement in semantic search quality for domain-specific queries. Empirical testing revealed the previous model achieved only 0.174-0.411 similarity scores on critical workflow terms (below the 0.5 acceptable threshold), prompting an immediate zero-cost upgrade. The new v4 collection achieved 73% completion (532/730 documents), enabling more accurate contextual AI assistance for global-workflow development and operations, with A/B testing and production cutover planned for completion. Full progress report.
Development followed the Empirical Accuracy Principle: all technical claims verified through measurement rather than assumption, ensuring trustworthy AI-assisted development practices.
NEW: Comprehensive documentation of the Model Context Protocol (MCP) server architecture that transforms GitHub Copilot from code completion to autonomous development assistance.
MCP_TOOL_ARCHITECTURE - Deep dive into the 21 specialized tools organized into 5 functional categories:
- WorkflowInfoTools (3) - Foundation layer with instant structural awareness
- CodeAnalysisTools (4) - Graph-based relationship intelligence via Neo4j
- SemanticSearchTools (7) - RAG-enhanced knowledge retrieval with ChromaDB
- OperationalTools (3) - Deep domain intelligence for HPC operations
- GitHubTools (4) - Repository and project collaboration intelligence
Why This Matters: This architecture represents a paradigm shift from "AI that writes code" to "AI that understands systems." By combining filesystem analysis, graph databases (Neo4j), vector embeddings (ChromaDB), and semantic search, the MCP platform enables:
- Autonomous research across documentation, code, and issues
- Impact analysis before making changes (dependency graphs)
- Compliance verification (EE2 standards) during development, not after
- Operational intelligence with HPC-specific guidance
- Collaborative awareness of ongoing work and project history
Configuration Modes:
-
full- All 21 tools (complete development environment) -
core- 7 tools (minimal, no databases required) -
rag- 17 tools (RAG without GitHub integration)
The Result: A fully-functional integrated agentic software development platform that doesn't just generate code - it understands architecture, follows standards, prevents breaking changes, and collaborates effectively. This is the future of weather model development at NOAA.
Documentation Status: Version 3.0.0 | Week 2 Consolidated Architecture | November 4, 2025
Comprehensive EE2 compliance audits conducted using the MCP (Model Context Protocol) RAG infrastructure with hybrid semantic search (ChromaDB) and graph-based code analysis (Neo4j). These AI-assisted analyses examined hundreds of job scripts, execution scripts, and utility libraries to identify critical compliance gaps and provide production-ready remediation plans.
- EE2_COMPLIANCE_ANALYSIS_GLOBAL_WORKFLOW - 40+ page comprehensive audit of the global-workflow repository identifying top 5 critical compliance issues blocking operational deployment. Analysis covers 255+ files (172 job scripts, 83 execution scripts, utilities) with detailed remediation plans, production-ready code examples, and 14-week phased implementation timeline.
Key Findings:
- Issue #1 (CRITICAL): Python error handling - 42 scripts lack try-except blocks
- Issue #2 (HIGH): Shell error exits -
&& truepattern defeats error detection - Issue #3 (HIGH): Environment variable validation -
${PDY:-}defaults to empty - Issue #4 (MEDIUM-HIGH): Weak utility error handling -
envsubstfailures silent - Issue #5 (MEDIUM): Inconsistent set -e and missing trap handlers
Provenance: Generated via static analysis and MCP RAG tools examining NOAA-EMC/global-workflow fork using ChromaDB semantic search (730 docs) and Neo4j graph analysis (8709 relationships). Analysis date: November 3, 2025.
-
EE2_COMPLIANCE_ANALYSIS_RRFS - Comprehensive EE2 compliance analysis of the RRFS (Rapid Refresh Forecast System) workflow repository. Examined 142+ files (26 jobs, 27 scripts, 35+ utilities, 54 Python modules) using MCP RAG tools. Key discovery: RRFS has better baseline compliance than global-workflow (consistent
set -xue, custom error functions) but shares critical gaps. 10-week implementation plan with priority remediation targets.
Key Findings:
- Issue #1 (CRITICAL): Missing err_chk function - All 26 job scripts call undefined function
- Issue #2 (HIGH): Python error handling - Better structure than global-workflow but incomplete
- Issue #3 (HIGH): Environment variable validation - Empty string defaults risk invalid paths
- Issue #4 (MEDIUM-HIGH): No trap handlers - Resource leaks on failures
- Issue #5 (MEDIUM): Insufficient error context - Good foundation needs enhancement
RRFS Advantages: Uses set -xue consistently (vs. set -e workarounds), custom print_err_msg_exit with caller context, filesystem operations use *_vrfy wrappers.
Provenance: Generated via MCP RAG hybrid analysis (semantic + graph) of NOAA-EMC/rrfs-workflow repository using ChromaDB vector search and Neo4j dependency mapping. Analysis date: November 3, 2025.
The Global Workflow development infrastructure has evolved to incorporate state-of-the-art RAG (Retrieval-Augmented Generation) and Graph Database technologies, enabling sophisticated agentic AI capabilities for GFS software management and error analysis.
-
README_PROVISIONING_V3.1_COMPLETE - Complete provisioning guide for the MCP RAG persistent infrastructure on ParallelWorks cloud platform. Covers ChromaDB 1.1.1 deployment, Node.js MCP server setup, LangFlow integration, and systemd service configuration for production-grade persistent storage architecture.
-
ENHANCED_INGESTION_ARCHITECTURE - Comprehensive design for Context7-inspired multi-source RAG ingestion across 50+ GFS submodules (3-5M LOC). Details the hybrid triple-store architecture combining ChromaDB (semantic search), Neo4j (graph relationships), and PostgreSQL (temporal data) for intelligent error diagnosis and code understanding.
-
CHROMADB_MIGRATION_COMPLETE - Technical documentation of ChromaDB 0.4.x to 1.1.1 migration, including API compatibility updates, Node.js client integration ([email protected]), and resolution of embedding dimension mismatches for production stability.
The Challenge: The Global Forecast System represents one of the most complex software ecosystems in scientific computing:
- 50+ interconnected repositories (UFS, GDAS, GSI, GOCART, MOM6, CICE, WW3, etc.)
- 3-5 million lines of code across Fortran, Python, C/C++, and CMake
- Deep dependency chains spanning atmospheric dynamics β ocean coupling β data assimilation β post-processing
- Multi-component interactions that traditional documentation cannot capture
The Solution: Hybrid Graph + Vector RAG Architecture
Traditional vector-based RAG (ChromaDB alone) excels at semantic similarity but cannot answer structural questions:
- β "What components are affected if I change FV3 dynamics?"
- β "What's the dependency chain causing this compilation error?"
- β "Which CMakeLists.txt needs to link the GSW library?"
- β "Show me the call graph from model initialization to MPI communication"
Graph RAG (Neo4j + ChromaDB) enables these capabilities:
Error Analysis Workflow:
ββ Semantic Search (ChromaDB): Find similar errors and solutions
ββ Structural Analysis (Neo4j): Trace dependency chains and call graphs
ββ Temporal Context (PostgreSQL): Recent commits and regression patterns
ββ LLM Synthesis: Root cause + Fix instructions + Prevention recommendations
The MCP (Model Context Protocol) server provides LLM agents with:
- Deep Code Understanding: Not just text search, but comprehension of component interactions
- Error Diagnosis: 10x faster debugging by combining similar past errors with structural impact analysis
- Impact Prediction: "What breaks if I change X?" before making changes
- Knowledge Retention: Institutional expertise captured in graph relationships
- Cross-Component Reasoning: Trace errors through UFS β GSI β GDAS β GFS pipeline
Result: Transform debugging from "search documentation and guess" to "query knowledge graph and know."
- β ChromaDB 1.1.1: Production vector database operational
- β Node.js MCP Server: 17 tools for workflow management and RAG search
- β LangFlow UI: Visual workflow builder for RAG pipelines
- π§ Neo4j Graph DB: Phase 0 POC approved, weekend implementation planned
- π Enhanced Ingestion: Multi-source ingestion pipeline designed for 50+ repos
Next Milestone: Neo4j proof-of-concept demonstrating dependency graph queries that ChromaDB cannot answer.
-
PR673_Comprehensive_Analysis - Complete technical analysis of PR #673 which introduced error catching capability to NCEPLIBS-bufr. This 50+ page analysis covers the architectural design using setjmp/longjmp, implementation details across 51 files, code review insights, testing strategy, and operational impact for NOAA's weather forecasting infrastructure.
-
ERROR_CATCHING_IMPLEMENTATION_PLAN - Detailed 17-week implementation plan for extending error catching to 24 additional I/O routines following the PR #673 pattern. The plan divides work into 4 phases by complexity level, includes automated testing frameworks, CI/CD strategies, and comprehensive quality assurance checklists.
-
additional_io_routines_for_error_catching - Comprehensive inventory of 38 additional I/O routines organized into 7 complexity levels for systematic error catching implementation. This reference document provides technical details, implementation priorities, and success metrics for achieving complete API coverage in the BUFR library.
The CTest framework provides self-contained test cases for validating individual workflow components. Each test creates an isolated environment with staged inputs from nightly stable baseline runs, enabling independent testing and validation.
-
- 120-hour deterministic forecast validation (209 output files)
- 13 input files (atmosphere initial conditions)
- 18 output files (forecast history files)
- C48_ATM-gfs_fcst_seg0.yaml
-
C48_ATM-gfs_atmos_prod_f000-f002
- Atmosphere product generation test (f000, f001, f002)
- 5 input files (forecast history from f000, f001, f002)
- 12 output files (post-processed products)
- C48_ATM-gfs_atmos_prod_f000-f002.yaml
-
- 48-hour coupled atmosphere-ocean-ice-wave forecast
- Fixed coupled forecast test with proper restart staging
- 17 input files (13 atmosphere ICs + 3 restarts + 1 wave prep)
- 24 output files (18 atmos + 2 ocean + 2 ice + 2 wave)
- Key fix: Added
H_offset = '-6H'for staging restart files from previous cycle - C48_S2SW-gfs_fcst_seg0.yaml
-
C48_S2SW-gfs_ocean_prod_f006 - Ocean product generation at forecast hour 6
- 2 input files (ocean forecast at f006)
- 2 output files (ocean products)
- C48_S2SW-gfs_ocean_prod_f006.yaml
-
- Ice product generation at forecast hour 6
- 2 input files (ice forecast at f006)
- 2 output files (ice products)
- C48_S2SW-gfs_ice_prod_f006.yaml
- C48_S2SWA_gefs-gefs_fcst_mem001_seg0
- GEFS ensemble member 001 coupled forecast (48-hour segment)
- Implemented GEFS ensemble member 001 forecast test
- 17 input files with unique two-cycle pattern:
- 13 atmosphere ICs from current cycle (12Z)
- 3 restart files from previous cycle (06Z)
- 1 wave prep file from current cycle (12Z)
- 24 output files (ensemble forecast outputs)
- GEFS requires different source cycles for ICs vs restarts
- Special handling for
mem001/subdirectory structure
- C48_S2SWA_gefs-gefs_fcst_mem001_seg0.yaml
Framework Features:
- Self-contained test environments with isolated EXPDIR
- Input staging from
STAGED_CTESTS(stable nightly runs) - Consistent naming convention:
CASE-JOB.yaml - Comprehensive validation with input/output file verification
Detailed root-cause analyses of CI failures, performed using the EIB MCP-RAG GraphRAG toolset. Each report includes execution chain tracing, environment variable dependency mapping, and an MCP tool call scorecard.
-
C96_atm3DVar-gdas_atmos_prod_f000-Error-Analysis-PR4359 β Unbound variable
paramlistb_f000inexglobal_atmos_products.sh. Caused by PR #4347 adding GCAFS-specific variables to a shared script without updating all config variants. Reverted via PR #4360. (Dec 2025) -
C96C48_hybatmDA-JGLOBAL_ENKF_SFC-Error-Analysis-PR4327 β Missing
COMROOT/date/t00zfile causing silentsetpdy.shfailure, propagated throughjjob_header.sherror suppression pattern. (Dec 2025) - C96C48mx500_S2SW_cyc_gfs-atmos_prod_f102_WRITE_ERROR β Write error during atmosphere product generation at forecast hour 102.
- C48_ATM_fail β C48 ATM test case failure analysis.
- GitLab-Pipeline-MultiHost-Architecture-for-GlobalβWorkflow
- Managing-Multiple-GitLab-Pipeline-Scripts
- Monitoring-your-HPC-CI-CD-infrastructure-with-GitLab
- Running-.after_script-Commands-on-a-Remote-Host:-Challenges-and-Solutions
- Configuring-the-Jenkins-Controller's-Node-default-configurations
- Increasing-the-number-of-executors-on-Jenkins-Master-(builtβin)-Node
- Scaling-Number-of-agents-on-a-Jenkins-Node
- Spreading-out-executors-on-RDHPCS-head-nodes
- SCM-Timeout-Fix
- Using-a-GitHub-Action-with-WebHooks-to-Dispatch-a-Jenkins-Job-for-a-Specific-PR-and-pass-it-Parameters
- Using-WebHooks-and-PR-Comments-to-Launch-a-MultiβBranch-Jenkins-Job
- Setting-up-a-mirror-of-a-GitHub-Repo-using-GitLab-Community-Edition-Server
- Proposal-for-Creating-Dedicated-emcβbot-Project-in-NWS-EMC-GitLab
- Q-Dev-Health-Check-2026-01-28 - Amazon Q Developer health check transcript showing MCP server v3.6.2 with 35 tools, ChromaDB (14,968 docs), Neo4j (86K relationships), and SDD framework (37 workflows) all operational.
- GitHub-MCP-Tools-installed-for-globalβworkflow-software-development-and-how-they-work
- Globalβworkflow-RAG-added-to-MCP-server
- RAG-enhanced-MCP-server-configured-and-all-8-tools-are-now-available
- MCP-RAG-Development-Status
- Differences-and-Similarities-between-MCP-(Model-Context-Protocol)-and-RAG-(RetrievalβAugmented-Generation)-in-agentic-LLM-pipeline
- MASSIVE-IMPROVMENT
- Opps-β-no-RAG-no-go
- Anthropic-Claude-Sonnet-4-has-the-big-picture-ahead-of-me
- Customizing-GitHub's-builtβin-Copilot's-PR-Feature
- Appreciated-Collaboration
- Thanks-Devin!!
- Thanks-Devin
- Rocoto-Example
- Tracking-UNKNOWN-states-in-Rocoto-and-suggested-updates
- "Resource-temporarily-unavailable"-when-using-rocotorun
- Fixing-the-Resource-Fork-Error-in-Rocoto-Check-Python-Script
- PID-created-and-release-on-each-call-to-rocotostat
- CROW-workflow-definition-explained
- Polymorphism-examples-in-CROW
- Undocumented-support-of-EcFlow-in-public-release-of-fv3gfs
- AQMβWorkflow-and-Global-Workflow-(Community-β-Operations-version)-for-Generating-EcFlow
- GCAFS-Overview - Comprehensive analysis of NOAA's next-generation aerosol and air quality forecasting system. Documents GCAFS architecture, its relationship to global-workflow, development timeline (4,040 commits since 2016), key contributors (Barry Baker, Li Pan, Cory Martin), and operational readiness status. GCAFS represents the fourth major forecasting capability alongside GFS, GEFS, and SFS, integrating the GOCART model for aerosol transport/chemistry. Analysis date: January 30, 2026
- How-are-IC-specified-for-Free-Forecast-Jobs
- Detailed-Analysis-of-HOMEDIR
- Prate_ave-in-atmospheric-sfc-history-files-accumulates-while-cycling
- Lingering-restart-files-can-cause-reβran-cases-to-fail
-
MPMD_MPI_Runtime_Infrastructure - Comprehensive documentation of the Multi-Program Multiple-Data (MPMD) execution framework and MPI runtime configuration across all 11 supported HPC platforms. Covers the core
run_mpmd.shorchestration script, platform-specific launcher configurations (Slurmsrun --multi-progvs PBSmpiexec cfp), MPI tuning parameters (Intel MPI, Cray MPICH, PMI2), network fabric details (InfiniBand, Slingshot, EFA), and the three-level job resource configuration chain. Essential reference for HPC operations, platform portability, and debugging parallel execution issues. Analysis date: January 30, 2026 -
MPMD & MPI Runtime Infrastructure Technical Paper (PDF, 17 pages) - Detailed LaTeX technical specification with architecture diagrams, algorithm pseudocode, platform comparison tables, MPI tuning parameters, and complete environment file appendices.
- Resource Configuration Comparison Technical Paper (PDF, 44 pages) - Comprehensive technical analysis comparing the declarative CROW system (2016-2020) with the current imperative Global-Workflow approach (2020-present). Includes TikZ architecture diagrams, algorithm pseudocode, detailed code examples from the CROW YAML DSL and current shell-based configuration, MPMD runtime integration analysis, validation pipeline recommendations, and architectural guidance for next-generation workflow infrastructure. Covers the complete resource specification lifecycle from definition through validation to runtime execution. Analysis date: January 30, 2026
- Getting-bash-for-GitLab-Runners-to-be-a-login-shell
- Why-Admins-Configure-Role-Accounts-with-login_shell=off
- Is-VAST-like-PFS?
Most Viewed Topics:
- CI/CD Pipeline Architecture
- Rocoto Workflow Management
- MCP/RAG Integration
- Jenkins Configuration
- HPC System Setup
Latest Updates:
- Phase 2 Semantic Annotation Architecture (December 2025)
- SME Training for Semantic Annotations (December 2025)
- Hybrid Build-Time/Runtime Compliance Validation
- MCP Server RAG Enhancement
- AI-Assisted Development Tools
This wiki is actively maintained. Last organized: December 2025