GitLab Pipeline MultiHost Architecture for Global‐Workflow - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki
Multi-Host GitLab CI Architecture for Global Workflow
Overview
The Global Workflow project uses a sophisticated templated GitLab CI architecture that enables parallel testing across multiple high-performance computing (HPC) platforms. This design promotes code reuse, maintainability, and easy extensibility when adding new computing hosts.
Architecture Components
.gitlab-ci.yml
)
1. Main Configuration File (The primary orchestrator that:
- Defines global pipeline stages (
build
,setup_tests
,run_tests
,finalize
) - Sets pipeline-level variables and defaults
- Includes specialized configuration files
- Provides base templates shared across all hosts
2. Specialized Configuration Files
.gitlab-ci-hosts.yml
- Host-Specific Configurations
A. - Purpose: Defines which tests run on which hosts
- Key Feature: Per-host test case matrices that are easily configurable
- Extensibility: New hosts can be added by following the established patterns
.gitlab-ci-cases.yml
- Test Case Templates
B. - Purpose: Defines reusable templates for standard experiment test cases
- Templates: Setup, execution, and finalization logic
- Scope: End-to-end workflow testing scenarios
.gitlab-ci-ctests.yml
- CTest Framework
C. - Purpose: CMake/CTest-based functional testing
- Scope: Individual Rocoto job testing with predefined input data
- Use Case: Quick PR validation via GitHub API
Multi-Host Template Design Pattern
Base Template Structure
All host-specific jobs inherit from shared base templates, ensuring consistency while allowing host-specific customization:
# Base template with common logic
.base_template:
extends: .base_config
stage: some_stage
script:
- common_logic_here
# Host-specific instantiation
job_name-hostname:
extends: .base_template
variables:
machine: hostname
tags:
- hostname
rules:
- if: conditions_for_this_host
Host Matrix Configuration
Each host defines its supported test cases through matrix variables:
# Example: Hera host configuration
.hera_cases_matrix: &hera_cases
- caseName: ["C48_ATM", "C48_S2SW", "C96_atm3DVar", ...]
# Jobs inherit this matrix
run_experiments-hera:
extends: .run_experiments_template
parallel:
matrix: *hera_cases
Supported Computing Platforms
Current Hosts
Host | Type | Test Cases Supported | Special Features |
---|---|---|---|
Hera | Research HPC | Full test suite (12 cases) | Complete ocean/wave/aerosol testing |
GAEAC6 | Research HPC | Full test suite (11 cases) | AWS cloud integration |
Orion | Research HPC | Reduced set (7 cases) | Resource-optimized testing |
Hercules | Research HPC | Standard set (9 cases) | Balanced testing coverage |
Host-Specific Features
Test Case Distribution Strategy
- Full Suite Hosts (Hera, GAEAC6): Run comprehensive testing including complex coupled models
- Optimized Hosts (Orion): Focus on core atmospheric testing with resource constraints
- Balanced Hosts (Hercules): Standard testing coverage without the most resource-intensive cases
Job Instantiation Process
1. Template Inheritance Chain
graph TD
A[.base_config] --> B[.setup_experiment_template]
A --> C[.run_experiments_template]
A --> D[.build_template]
B --> E[setup_experiments-hera]
B --> F[setup_experiments-orion]
B --> G[setup_experiments-hercules]
C --> H[run_experiments-hera]
C --> I[run_experiments-orion]
C --> J[run_experiments-hercules]
D --> K[build-hera]
D --> L[build-orion]
D --> M[build-hercules]
2. Dynamic Job Creation
For each host, the CI system automatically creates:
Standard Test Cases (PR Validation)
setup_experiments-{host}
: Parallel jobs for each test case in the host's matrixrun_experiments-{host}
: Parallel execution jobs that depend on setup completionfinalize_success-{host}
: Success reporting and GitHub label management
CTest Framework (Quick Validation)
setup_ctests-{host}
: CMake/CTest environment preparationrun_ctests-{host}
: Parallel CTest execution for specific test labels
Build Process (Foundation)
build-{host}
: Compilation and environment setup for the specific platform
3. Dependency Chain
# Example dependency flow for Hera
build-hera → setup_experiments-hera → run_experiments-hera → finalize_success-hera
build-hera → setup_ctests-hera → run_ctests-hera
Pipeline Execution Modes
PIPELINE_TYPE=pr_cases
)
Mode 1: PR Cases (- Trigger: GitHub PR events via API
- Scope: Full end-to-end workflow testing
- Duration: Several hours per host
- Purpose: Comprehensive validation before merge
PIPELINE_TYPE=ctests
)
Mode 2: CTests (- Trigger: GitHub API for quick validation
- Scope: Individual Rocoto job testing
- Duration: Minutes to hours
- Purpose: Rapid feedback for code changes
GFS_CI_RUN_TYPE=nightly
)
Mode 3: Nightly Runs (- Trigger: GitLab scheduled pipelines
- Scope: Full regression testing on develop branch
- Duration: Extended execution with archival
- Purpose: Continuous integration monitoring
Conditional Execution Logic
Host Selection
rules:
- if: ($RUN_ON_MACHINES =~ /\bhera\b|all/) # Run on Hera or all hosts
Pipeline Type Routing
rules:
- if: $PIPELINE_TYPE == "pr_cases" && $CI_PIPELINE_SOURCE == "trigger"
- if: $PIPELINE_TYPE == "ctests" && $CI_PIPELINE_SOURCE == "trigger"
GitHub Integration
rules:
- if: $PR_NUMBER != 0 # Only for actual PRs, not develop branch
Adding New Computing Hosts
Step 1: Define Host Configuration
Add to .gitlab-ci-hosts.yml
:
# Define test matrix for new host
.newhost_cases_matrix: &newhost_cases
- caseName: ["C48_ATM", "C48_S2SW", ...] # Customize based on host capabilities
# Build job
build-newhost:
extends: .build_template
variables:
machine: newhost
tags:
- newhost
rules:
- if: ($RUN_ON_MACHINES =~ /\bnewhost\b|all/)
Step 2: Add Test Jobs
# Standard cases
setup_experiments-newhost:
extends: .setup_experiment_template
variables:
machine: newhost
tags:
- newhost
parallel:
matrix: *newhost_cases
needs:
- build-newhost
run_experiments-newhost:
extends: .run_experiments_template
# ... similar pattern
Step 3: Configure GitLab Runner
- Register GitLab runner on the new host
- Configure runner with appropriate tags
- Ensure access to required software stack
Step 4: Platform Configuration
Add host-specific configurations in:
dev/ci/platforms/config.{newhost}
: Environment and module setupenv/{NEWHOST}.env
: Host-specific environment variables
Error Handling and Reporting
GitHub Integration
- PR Labels: Automatic labeling based on pipeline state
CI-{Host}-Building
: During compilationCI-{Host}-Running
: During test executionCI-{Host}-Passed
: On successful completionCI-{Host}-Failed
: On any failure
Failure Reporting
- Error Log Collection: Automated gathering of failed job logs
- GitHub Gist Publishing: Public sharing of error details via GitHub Gists
- PR Comments: Automatic failure notifications with diagnostic links
Cleanup Actions
- Resource Management: Automatic cleanup of failed experiments
- State Tracking: Proper handling of experiment lifecycle states
- Retry Logic: Built-in retry mechanisms for transient failures
Benefits of This Architecture
1. Scalability
- Easy addition of new hosts without code duplication
- Parallel execution across multiple platforms
- Resource-aware test case distribution
2. Maintainability
- Single source of truth for test logic in templates
- Host-specific customization through variables and matrices
- Clear separation of concerns between components
3. Flexibility
- Different testing modes for different use cases
- Conditional execution based on trigger type and host availability
- Configurable test case selection per host
4. Reliability
- Comprehensive error handling and reporting
- Integration with GitHub for developer feedback
- Automated cleanup and resource management
This architecture enables the Global Workflow project to maintain high code quality through comprehensive testing across diverse HPC environments while remaining maintainable and extensible for future computing platforms.