GitLab Pipeline MultiHost Architecture for Global‐Workflow - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki

Multi-Host GitLab CI Architecture for Global Workflow

Overview

The Global Workflow project uses a sophisticated templated GitLab CI architecture that enables parallel testing across multiple high-performance computing (HPC) platforms. This design promotes code reuse, maintainability, and easy extensibility when adding new computing hosts.

Architecture Components

1. Main Configuration File (`.gitlab-ci.yml`)

The primary orchestrator that:

Defines global pipeline stages (build, setup_tests, run_tests, finalize)
Sets pipeline-level variables and defaults
Includes specialized configuration files
Provides base templates shared across all hosts

2. Specialized Configuration Files

A. `.gitlab-ci-hosts.yml` - Host-Specific Configurations

Purpose: Defines which tests run on which hosts
Key Feature: Per-host test case matrices that are easily configurable
Extensibility: New hosts can be added by following the established patterns

B. `.gitlab-ci-cases.yml` - Test Case Templates

Purpose: Defines reusable templates for standard experiment test cases
Templates: Setup, execution, and finalization logic
Scope: End-to-end workflow testing scenarios

C. `.gitlab-ci-ctests.yml` - CTest Framework

Purpose: CMake/CTest-based functional testing
Scope: Individual Rocoto job testing with predefined input data
Use Case: Quick PR validation via GitHub API

Multi-Host Template Design Pattern

Base Template Structure

All host-specific jobs inherit from shared base templates, ensuring consistency while allowing host-specific customization:

# Base template with common logic
.base_template:
  extends: .base_config
  stage: some_stage
  script:
    - common_logic_here

# Host-specific instantiation
job_name-hostname:
  extends: .base_template
  variables:
    machine: hostname
  tags:
    - hostname
  rules:
    - if: conditions_for_this_host

Host Matrix Configuration

Each host defines its supported test cases through matrix variables:

# Example: Hera host configuration
.hera_cases_matrix: &hera_cases
  - caseName: ["C48_ATM", "C48_S2SW", "C96_atm3DVar", ...]

# Jobs inherit this matrix
run_experiments-hera:
  extends: .run_experiments_template
  parallel:
    matrix: *hera_cases

Supported Computing Platforms

Current Hosts

Host	Type	Test Cases Supported	Special Features
Hera	Research HPC	Full test suite (12 cases)	Complete ocean/wave/aerosol testing
GAEAC6	Research HPC	Full test suite (11 cases)	AWS cloud integration
Orion	Research HPC	Reduced set (7 cases)	Resource-optimized testing
Hercules	Research HPC	Standard set (9 cases)	Balanced testing coverage

Host-Specific Features

Test Case Distribution Strategy

Full Suite Hosts (Hera, GAEAC6): Run comprehensive testing including complex coupled models
Optimized Hosts (Orion): Focus on core atmospheric testing with resource constraints
Balanced Hosts (Hercules): Standard testing coverage without the most resource-intensive cases

Job Instantiation Process

1. Template Inheritance Chain

graph TD
    A[.base_config] --> B[.setup_experiment_template]
    A --> C[.run_experiments_template]
    A --> D[.build_template]
    
    B --> E[setup_experiments-hera]
    B --> F[setup_experiments-orion]
    B --> G[setup_experiments-hercules]
    
    C --> H[run_experiments-hera]
    C --> I[run_experiments-orion] 
    C --> J[run_experiments-hercules]
    
    D --> K[build-hera]
    D --> L[build-orion]
    D --> M[build-hercules]

2. Dynamic Job Creation

For each host, the CI system automatically creates:

Standard Test Cases (PR Validation)

setup_experiments-{host}: Parallel jobs for each test case in the host's matrix
run_experiments-{host}: Parallel execution jobs that depend on setup completion
finalize_success-{host}: Success reporting and GitHub label management

CTest Framework (Quick Validation)

setup_ctests-{host}: CMake/CTest environment preparation
run_ctests-{host}: Parallel CTest execution for specific test labels

Build Process (Foundation)

build-{host}: Compilation and environment setup for the specific platform

3. Dependency Chain

# Example dependency flow for Hera
build-hera → setup_experiments-hera → run_experiments-hera → finalize_success-hera
build-hera → setup_ctests-hera → run_ctests-hera

Pipeline Execution Modes

Mode 1: PR Cases (`PIPELINE_TYPE=pr_cases`)

Trigger: GitHub PR events via API
Scope: Full end-to-end workflow testing
Duration: Several hours per host
Purpose: Comprehensive validation before merge

Mode 2: CTests (`PIPELINE_TYPE=ctests`)

Trigger: GitHub API for quick validation
Scope: Individual Rocoto job testing
Duration: Minutes to hours
Purpose: Rapid feedback for code changes

Mode 3: Nightly Runs (`GFS_CI_RUN_TYPE=nightly`)

Trigger: GitLab scheduled pipelines
Scope: Full regression testing on develop branch
Duration: Extended execution with archival
Purpose: Continuous integration monitoring

Conditional Execution Logic

Host Selection

rules:
  - if: ($RUN_ON_MACHINES =~ /\bhera\b|all/)  # Run on Hera or all hosts

Pipeline Type Routing

rules:
  - if: $PIPELINE_TYPE == "pr_cases" && $CI_PIPELINE_SOURCE == "trigger"
  - if: $PIPELINE_TYPE == "ctests" && $CI_PIPELINE_SOURCE == "trigger"

GitHub Integration

rules:
  - if: $PR_NUMBER != 0  # Only for actual PRs, not develop branch

Adding New Computing Hosts

Step 1: Define Host Configuration

Add to .gitlab-ci-hosts.yml:

# Define test matrix for new host
.newhost_cases_matrix: &newhost_cases
  - caseName: ["C48_ATM", "C48_S2SW", ...]  # Customize based on host capabilities

# Build job
build-newhost:
  extends: .build_template
  variables:
    machine: newhost
  tags:
    - newhost
  rules:
    - if: ($RUN_ON_MACHINES =~ /\bnewhost\b|all/)

Step 2: Add Test Jobs

# Standard cases
setup_experiments-newhost:
  extends: .setup_experiment_template
  variables:
    machine: newhost
  tags:
    - newhost
  parallel:
    matrix: *newhost_cases
  needs:
    - build-newhost

run_experiments-newhost:
  extends: .run_experiments_template
  # ... similar pattern

Step 3: Configure GitLab Runner

Register GitLab runner on the new host
Configure runner with appropriate tags
Ensure access to required software stack

Step 4: Platform Configuration

Add host-specific configurations in:

dev/ci/platforms/config.{newhost}: Environment and module setup
env/{NEWHOST}.env: Host-specific environment variables

Error Handling and Reporting

GitHub Integration

PR Labels: Automatic labeling based on pipeline state
- CI-{Host}-Building: During compilation
- CI-{Host}-Running: During test execution
- CI-{Host}-Passed: On successful completion
- CI-{Host}-Failed: On any failure

Failure Reporting

Error Log Collection: Automated gathering of failed job logs
GitHub Gist Publishing: Public sharing of error details via GitHub Gists
PR Comments: Automatic failure notifications with diagnostic links

Cleanup Actions

Resource Management: Automatic cleanup of failed experiments
State Tracking: Proper handling of experiment lifecycle states
Retry Logic: Built-in retry mechanisms for transient failures

Benefits of This Architecture

1. Scalability

Easy addition of new hosts without code duplication
Parallel execution across multiple platforms
Resource-aware test case distribution

2. Maintainability

Single source of truth for test logic in templates
Host-specific customization through variables and matrices
Clear separation of concerns between components

3. Flexibility

Different testing modes for different use cases
Conditional execution based on trigger type and host availability
Configurable test case selection per host

4. Reliability

Comprehensive error handling and reporting
Integration with GitHub for developer feedback
Automated cleanup and resource management

This architecture enables the Global Workflow project to maintain high code quality through comprehensive testing across diverse HPC environments while remaining maintainable and extensible for future computing platforms.

GitLab Pipeline MultiHost Architecture for Global‐Workflow - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki

Multi-Host GitLab CI Architecture for Global Workflow

Overview

Architecture Components

1. Main Configuration File (.gitlab-ci.yml)

2. Specialized Configuration Files

A. .gitlab-ci-hosts.yml - Host-Specific Configurations

B. .gitlab-ci-cases.yml - Test Case Templates

C. .gitlab-ci-ctests.yml - CTest Framework

Multi-Host Template Design Pattern

Base Template Structure

Host Matrix Configuration

Supported Computing Platforms

Current Hosts

Host-Specific Features

Test Case Distribution Strategy

Job Instantiation Process

1. Template Inheritance Chain

2. Dynamic Job Creation

Standard Test Cases (PR Validation)

CTest Framework (Quick Validation)

Build Process (Foundation)

3. Dependency Chain

Pipeline Execution Modes

Mode 1: PR Cases (PIPELINE_TYPE=pr_cases)

Mode 2: CTests (PIPELINE_TYPE=ctests)

Mode 3: Nightly Runs (GFS_CI_RUN_TYPE=nightly)

Conditional Execution Logic

Host Selection

Pipeline Type Routing

GitHub Integration

Adding New Computing Hosts

Step 1: Define Host Configuration

Step 2: Add Test Jobs

Step 3: Configure GitLab Runner

Step 4: Platform Configuration

Error Handling and Reporting

GitHub Integration

Failure Reporting

Cleanup Actions

Benefits of This Architecture

1. Scalability

2. Maintainability

3. Flexibility

4. Reliability

1. Main Configuration File (`.gitlab-ci.yml`)

A. `.gitlab-ci-hosts.yml` - Host-Specific Configurations

B. `.gitlab-ci-cases.yml` - Test Case Templates

C. `.gitlab-ci-ctests.yml` - CTest Framework

Mode 1: PR Cases (`PIPELINE_TYPE=pr_cases`)

Mode 2: CTests (`PIPELINE_TYPE=ctests`)

Mode 3: Nightly Runs (`GFS_CI_RUN_TYPE=nightly`)