GCP ML Pipeline Architecture - stanlypoc/AIRA GitHub Wiki

Here is your GCP ML Pipeline Architecture content converted into a GitHub Wiki-ready format using markdown and code blocks (GitHub supports mermaid, python, bash, and regular markdown):

📦 GCP ML Pipeline Architecture

📊 Architecture Overview

graph TD
    A[Data Sources] --> B[Cloud Storage]
    B --> C[Vertex AI Data Labeling]
    C --> D[BigQuery]
    D --> E[Vertex AI Feature Store]
    E --> F[Vertex AI Training]
    F --> G[Vertex AI Model Registry]
    G --> H[Vertex AI Endpoints]
    H --> I[Cloud Functions/Cloud Run]
    I --> J[End Users]
    K[CI/CD Pipeline] -->|Triggers| F
    K -->|Deploys| H
    L[IAM & Security] --> AllComponents

🧩 Component Details

1. Data Ingestion & Storage

Cloud Storage (GCS)

Purpose: Raw data lake for unstructured/semi-structured data
Scalability: Auto-scales with multi-regional storage classes
Security: IAM roles (storage.objectAdmin, storage.objectViewer)
Example: gs://my-ml-data-bucket/raw-images/

BigQuery

Purpose: Structured data warehouse for tabular data
Performance: Columnar storage + BI Engine
IAM: bigquery.dataEditor

2. Data Preparation

Vertex AI Data Labeling

Purpose: Human-in-the-loop labeling with AI tools
Availability: Global workforce, 99.9% SLA
IAM: aiplatform.dataLabeler

Vertex AI Feature Store

Purpose: Centralized feature repository
Scalability: 10M+ QPS
IAM: aiplatform.featureStoreAdmin

3. Model Training

Vertex AI Training

Purpose: Managed training with AutoML or custom containers
Performance: Supports GPUs (A100/V100) and TPU v4 pods
IAM:
- aiplatform.user
- aiplatform.customCodeServiceAgent

4. Model Management

Vertex AI Model Registry

Purpose: Versioned model storage with lineage
Security: CMEK encryption
IAM: aiplatform.modelAdmin

5. Serving

Vertex AI Endpoints

Purpose: Serverless model serving
Scalability: Scale-to-zero, handles 100K+ RPS
IAM: aiplatform.endpointAdmin

6. Orchestration

Cloud Composer (Airflow)

Purpose: Workflow orchestration
Availability: Regional HA
IAM: composer.worker

⚙️ Architecture Attributes

1. Scalability

Horizontal: Distributed training (Vertex AI), auto-scaling BigQuery slots
Vertical: GPU/TPU acceleration for endpoints

2. Performance

Low latency: <10ms from Feature Store, Triton-optimized predictions
High throughput: TB-level BigQuery query handling

3. Availability

Multi-region: GCS + Vertex AI global footprint
SLA: Vertex AI (99.9%), BigQuery (99.95%)

4. Security

Encryption: Google-managed or CMEK at rest, TLS 1.3 in transit
Network: VPC-SC, Private IP, IAM Conditions

5. IAM Setup

# Terraform example for IAM bindings
resource "google_project_iam_binding" "ml_engineer" {
  role    = "roles/aiplatform.user"
  members = ["user:[email protected]"]
}

resource "google_storage_bucket_iam_binding" "data_read" {
  bucket = "my-ml-data-bucket"
  role   = "roles/storage.objectViewer"
  members = ["serviceAccount:[email protected]"]
}

🔁 CI/CD Pipeline

graph LR
    A[GitHub/Cloud Source Repos] --> B[Cloud Build]
    B -->|Triggers| C[Vertex AI Pipeline]
    C --> D[Model Registry]
    D --> E[Vertex AI Endpoint]
    E --> F[Cloud Deploy]
    F --> G[Promotion to Prod]

Tools

Cloud Build: Build and deploy with cloudbuild.yaml
Vertex AI Pipelines: Kubeflow-based orchestration
Cloud Deploy: Canary/promotional rollouts

IAM

cloudbuild.builds.editor
aiplatform.pipelineRunner

⭐ Key Advantages

Fully Managed: No infrastructure overhead
MLOps Native: Integrated model monitoring and rollback
Cost Efficiency: Preemptible training VMs, scale-to-zero inference

💡 Tip: Add Cloud Armor for DDoS protection and Dataflow for streaming ETL in production.