Azure ML Pipeline Architecture - stanlypoc/AIRA GitHub Wiki

Here is your Azure ML Pipeline Architecture content reformatted for GitHub Wiki — in clean markdown with all AI-style language and promotional tone removed. The technical detail is preserved for accurate documentation:


Azure ML Pipeline Architecture

Architecture Diagram

graph TD
    A[Data Sources] --> B[Azure Blob Storage]
    B --> C[Azure Data Factory]
    C --> D[Azure Databricks]
    D --> E[Azure ML Data Labeling]
    E --> F[Azure ML Feature Store]
    F --> G[Azure ML Training]
    G --> H[Azure ML Model Registry]
    H --> I[Azure ML Online Endpoints]
    I --> J[Azure Functions/App Service]
    J --> K[End Users]
    L[CI/CD Pipeline] -->|Triggers| G
    L -->|Deploys| I
    M[IAM & Security] --> AllComponents
Loading

Component Details

1. Data Ingestion & Storage

Azure Blob Storage

  • Used for storing raw and processed data.
  • Supports tiered storage (Hot/Cool/Archive).
  • Access control via RBAC and SAS tokens.
  • Example path: https://<account>.blob.core.windows.net/ml-data/raw/

Azure Data Lake Gen2

  • Optimized for big data analytics with a hierarchical namespace.
  • Supports fine-grained access via POSIX ACLs and Azure RBAC.

2. Data Preparation

Azure Data Factory

  • Manages ETL and ELT workflows.
  • Supports zone-redundant data pipelines.
  • Requires Data Factory Contributor role.

Azure ML Data Labeling

  • Interface for manual annotation.
  • Supports team-based distributed labeling.
  • Requires a custom role for data labelers.

3. Feature Engineering

Azure ML Feature Store

  • Stores and serves model features.
  • Provides low-latency access.
  • Requires Feature Store Contributor role.

4. Model Training

Azure ML Training Jobs

  • Supports distributed training on GPU-enabled virtual machines.
  • Access to training environments controlled by ML Compute Operator.
  • Storage access is managed via managed identity.

5. Model Management

Azure ML Model Registry

  • Stores versioned models and metadata.
  • Models can be encrypted using customer-managed keys.
  • Requires ML Model Registrar role.

6. Model Serving

Azure ML Online Endpoints

  • Used for real-time model inference.
  • Autoscaling supported.
  • Access granted via ML Endpoint Contributor role.

7. Orchestration

Azure ML Pipelines

  • Automates training and inference workflows using directed acyclic graphs (DAG).
  • Available across multiple Azure regions.
  • Requires ML Pipeline Operator role.

Architecture Attributes

1. Scalability

  • Blob Storage and ML training scale horizontally and vertically.
  • Endpoints support high-spec compute instances for heavy inference workloads.

2. Performance

  • ONNX Runtime enhances inference speed.
  • Feature Store provides low-latency feature retrieval.

3. Availability

  • Blob storage supports geo-redundant storage (GRS).
  • ML services and components are distributed across regions.

4. Security

  • Data encryption at rest managed by Azure Key Vault.
  • Private Link and NSG rules can isolate access to endpoints and services.

IAM Setup

{
  "type": "Microsoft.Authorization/roleAssignments",
  "apiVersion": "2022-04-01",
  "name": "[guid(resourceGroup().id)]",
  "properties": {
    "roleDefinitionId": "[resourceId('Microsoft.Authorization/roleDefinitions', 'b24988ac-6180-42a0-ab88-20f7382dd24c')]",
    "principalId": "<service-principal-id>",
    "scope": "[resourceGroup().id]"
  }
}

CI/CD Pipeline

graph LR
    A[Azure Repos] --> B[Azure Pipelines]
    B -->|Build| C[Azure Container Registry]
    C -->|Deploy| D[Azure ML Training]
    D --> E[Model Registry]
    E -->|Approval| F[ML Online Endpoint]
    F --> G[Blue-Green Deployment]
Loading

CI/CD Components:

  • Azure Pipelines: Defines multi-stage build and deploy workflows.
  • Azure Container Registry: Stores model containers.
  • Azure Policy: Used for validation and compliance during deployments.

Required Roles:

  • Azure DevOps Contributor for CI
  • ML Deployment Admin for CD

Operational Considerations

  • Azure Monitor and Application Insights can be used for endpoint and job monitoring.
  • For production use cases, Azure DDoS Protection and Private Link should be configured.
⚠️ **GitHub.com Fallback** ⚠️