Governance Policies Integration Data Pipeline - Azure/az-prototype GitHub Wiki
Governance policies for Data Pipeline
Domain: integration
| Name | Description |
|---|---|
| Data Factory to SQL and Storage with managed identity | ADF with managed VNet IR, managed private endpoints to SQL and Storage, Key Vault for secrets |
| Synapse with ADLS Gen2 data lake | Synapse workspace with ADLS Gen2 default storage, managed private endpoints, and Storage Blob Data Contributor role |
| Databricks with Key Vault secret scope | Databricks Premium workspace with Key Vault-backed secret scope via REST API, RBAC authorization |
| Description | Instead |
|---|---|
| Do not store credentials in Data Factory or Synapse linked service definitions | Use managed identity with RBAC roles or Key Vault references for all data source connections |
| Do not use public endpoints for data movement between services | Use managed private endpoints (ADF/Synapse) or private endpoints with private DNS zones |
| Do not use Databricks-managed secret scopes in production | Use Key Vault-backed secret scopes with RBAC authorization and audit logging |
- Data Factory managed private endpoints
- Synapse managed private endpoints
- Databricks Key Vault-backed secret scopes
- Data Factory identity-based authentication
| Check | Severity | Description |
|---|---|---|
| CC-INT-DP-001 | Required | Configure Data Factory linked services to SQL Database and Storage using managed identity — never stored credentials |
| CC-INT-DP-002 | Required | Configure Synapse Workspace with ADLS Gen2 default data lake using managed identity and managed private endpoints |
| CC-INT-DP-003 | Required | Configure Databricks workspace with Key Vault-backed secret scope for secure credential access |
| CC-INT-DP-004 | Required | Enforce encryption in transit for all cross-service data movement using private endpoints and TLS 1.2+ |
Configure Data Factory linked services to SQL Database and Storage using managed identity — never stored credentials
Severity: Required
Rationale: Managed identity eliminates credential rotation burden and prevents secret sprawl across linked services
Agents: cloud-architect, terraform-agent, bicep-agent, app-developer, csharp-developer, python-developer
- Microsoft.DataFactory/factories
- Microsoft.Synapse/workspaces
- Microsoft.Databricks/workspaces
- Microsoft.Storage/storageAccounts
- Microsoft.Sql/servers
| Resource | Name | Purpose |
|---|---|---|
| Microsoft.DataFactory/factories/managedVirtualNetworks | default | Managed VNet for ADF to enable managed private endpoints |
| Microsoft.DataFactory/factories/integrationRuntimes | ManagedVNetIR | Managed VNet integration runtime — all data movement stays on Azure backbone |
| Microsoft.DataFactory/factories/managedVirtualNetworks/managedPrivateEndpoints | mpe-sql | Managed private endpoint from ADF to SQL Database |
| Microsoft.Authorization/roleAssignments | adf-storage-contributor | Storage Blob Data Contributor role for ADF managed identity |
Configure Synapse Workspace with ADLS Gen2 default data lake using managed identity and managed private endpoints
Severity: Required
Rationale: Synapse requires a default data lake for workspace artifacts; managed identity eliminates storage account keys in configuration
Agents: cloud-architect, terraform-agent, bicep-agent
- Microsoft.DataFactory/factories
- Microsoft.Synapse/workspaces
- Microsoft.Databricks/workspaces
- Microsoft.Storage/storageAccounts
- Microsoft.Sql/servers
| Resource | Name | Purpose |
|---|---|---|
| Microsoft.Storage/storageAccounts | synapse-data-lake | ADLS Gen2 storage account (isHnsEnabled: true) for Synapse default data lake |
| Microsoft.Authorization/roleAssignments | synapse-storage-contributor | Storage Blob Data Contributor role for Synapse managed identity on data lake |
| Microsoft.Synapse/workspaces/managedVirtualNetworks/managedPrivateEndpoints | mpe-datalake-dfs | Managed private endpoint for Synapse to access data lake DFS endpoint |
Configure Databricks workspace with Key Vault-backed secret scope for secure credential access
Severity: Required
Rationale: Databricks secret scopes backed by Key Vault centralize secret management; Azure-managed scopes lack audit trail and rotation
Agents: cloud-architect, terraform-agent, bicep-agent, app-developer, csharp-developer, python-developer
- Microsoft.DataFactory/factories
- Microsoft.Synapse/workspaces
- Microsoft.Databricks/workspaces
- Microsoft.Storage/storageAccounts
- Microsoft.Sql/servers
| Resource | Name | Purpose |
|---|---|---|
| Microsoft.KeyVault/vaults | dbr-key-vault | Key Vault with RBAC authorization for Databricks secret scope backing store |
| Microsoft.Network/privateEndpoints | pe-dbr-kv | Private endpoint for Key Vault access from Databricks VNet |
| Microsoft.Authorization/roleAssignments | dbr-kv-secrets-user | Key Vault Secrets User role (4633458b) for Databricks workspace identity |
Enforce encryption in transit for all cross-service data movement using private endpoints and TLS 1.2+
Severity: Required
Rationale: Data in transit between services must be encrypted and routed privately to prevent interception and exfiltration
Agents: cloud-architect, terraform-agent, bicep-agent, security-reviewer
- Microsoft.DataFactory/factories
- Microsoft.Synapse/workspaces
- Microsoft.Databricks/workspaces
- Microsoft.Storage/storageAccounts
- Microsoft.Sql/servers
| Resource | Name | Purpose |
|---|---|---|
| Microsoft.Network/privateEndpoints | pe-sql | Private endpoint for SQL Server with groupId 'sqlServer' |
| Microsoft.Network/privateDnsZones | privatelink.database.windows.net | Private DNS zone for SQL Server private endpoint resolution |