Azure ML Pipeline Architecture - stanlypoc/AIRA GitHub Wiki
Here is your Azure ML Pipeline Architecture content reformatted for GitHub Wiki — in clean markdown with all AI-style language and promotional tone removed. The technical detail is preserved for accurate documentation:
graph TD
A[Data Sources] --> B[Azure Blob Storage]
B --> C[Azure Data Factory]
C --> D[Azure Databricks]
D --> E[Azure ML Data Labeling]
E --> F[Azure ML Feature Store]
F --> G[Azure ML Training]
G --> H[Azure ML Model Registry]
H --> I[Azure ML Online Endpoints]
I --> J[Azure Functions/App Service]
J --> K[End Users]
L[CI/CD Pipeline] -->|Triggers| G
L -->|Deploys| I
M[IAM & Security] --> AllComponents
Azure Blob Storage
- Used for storing raw and processed data.
- Supports tiered storage (Hot/Cool/Archive).
- Access control via RBAC and SAS tokens.
- Example path:
https://<account>.blob.core.windows.net/ml-data/raw/
Azure Data Lake Gen2
- Optimized for big data analytics with a hierarchical namespace.
- Supports fine-grained access via POSIX ACLs and Azure RBAC.
Azure Data Factory
- Manages ETL and ELT workflows.
- Supports zone-redundant data pipelines.
- Requires
Data Factory Contributor
role.
Azure ML Data Labeling
- Interface for manual annotation.
- Supports team-based distributed labeling.
- Requires a custom role for data labelers.
Azure ML Feature Store
- Stores and serves model features.
- Provides low-latency access.
- Requires
Feature Store Contributor
role.
Azure ML Training Jobs
- Supports distributed training on GPU-enabled virtual machines.
- Access to training environments controlled by
ML Compute Operator
. - Storage access is managed via managed identity.
Azure ML Model Registry
- Stores versioned models and metadata.
- Models can be encrypted using customer-managed keys.
- Requires
ML Model Registrar
role.
Azure ML Online Endpoints
- Used for real-time model inference.
- Autoscaling supported.
- Access granted via
ML Endpoint Contributor
role.
Azure ML Pipelines
- Automates training and inference workflows using directed acyclic graphs (DAG).
- Available across multiple Azure regions.
- Requires
ML Pipeline Operator
role.
- Blob Storage and ML training scale horizontally and vertically.
- Endpoints support high-spec compute instances for heavy inference workloads.
- ONNX Runtime enhances inference speed.
- Feature Store provides low-latency feature retrieval.
- Blob storage supports geo-redundant storage (GRS).
- ML services and components are distributed across regions.
- Data encryption at rest managed by Azure Key Vault.
- Private Link and NSG rules can isolate access to endpoints and services.
{
"type": "Microsoft.Authorization/roleAssignments",
"apiVersion": "2022-04-01",
"name": "[guid(resourceGroup().id)]",
"properties": {
"roleDefinitionId": "[resourceId('Microsoft.Authorization/roleDefinitions', 'b24988ac-6180-42a0-ab88-20f7382dd24c')]",
"principalId": "<service-principal-id>",
"scope": "[resourceGroup().id]"
}
}
graph LR
A[Azure Repos] --> B[Azure Pipelines]
B -->|Build| C[Azure Container Registry]
C -->|Deploy| D[Azure ML Training]
D --> E[Model Registry]
E -->|Approval| F[ML Online Endpoint]
F --> G[Blue-Green Deployment]
CI/CD Components:
- Azure Pipelines: Defines multi-stage build and deploy workflows.
- Azure Container Registry: Stores model containers.
- Azure Policy: Used for validation and compliance during deployments.
Required Roles:
-
Azure DevOps Contributor
for CI -
ML Deployment Admin
for CD
- Azure Monitor and Application Insights can be used for endpoint and job monitoring.
- For production use cases, Azure DDoS Protection and Private Link should be configured.