Governance Policies Integration Data Pipeline - Azure/az-prototype GitHub Wiki

Data Pipeline

Governance policies for Data Pipeline

Domain: integration

Patterns

Name Description
Data Factory to SQL and Storage with managed identity ADF with managed VNet IR, managed private endpoints to SQL and Storage, Key Vault for secrets
Synapse with ADLS Gen2 data lake Synapse workspace with ADLS Gen2 default storage, managed private endpoints, and Storage Blob Data Contributor role
Databricks with Key Vault secret scope Databricks Premium workspace with Key Vault-backed secret scope via REST API, RBAC authorization

Anti-Patterns

Description Instead
Do not store credentials in Data Factory or Synapse linked service definitions Use managed identity with RBAC roles or Key Vault references for all data source connections
Do not use public endpoints for data movement between services Use managed private endpoints (ADF/Synapse) or private endpoints with private DNS zones
Do not use Databricks-managed secret scopes in production Use Key Vault-backed secret scopes with RBAC authorization and audit logging

References


Checks (4)

Check Severity Description
CC-INT-DP-001 Required Configure Data Factory linked services to SQL Database and Storage using managed identity — never stored credentials
CC-INT-DP-002 Required Configure Synapse Workspace with ADLS Gen2 default data lake using managed identity and managed private endpoints
CC-INT-DP-003 Required Configure Databricks workspace with Key Vault-backed secret scope for secure credential access
CC-INT-DP-004 Required Enforce encryption in transit for all cross-service data movement using private endpoints and TLS 1.2+

CC-INT-DP-001

Configure Data Factory linked services to SQL Database and Storage using managed identity — never stored credentials

Severity: Required
Rationale: Managed identity eliminates credential rotation burden and prevents secret sprawl across linked services
Agents: cloud-architect, terraform-agent, bicep-agent, app-developer, csharp-developer, python-developer

Targets

  • Microsoft.DataFactory/factories
  • Microsoft.Synapse/workspaces
  • Microsoft.Databricks/workspaces
  • Microsoft.Storage/storageAccounts
  • Microsoft.Sql/servers

Companion Resources

Resource Name Purpose
Microsoft.DataFactory/factories/managedVirtualNetworks default Managed VNet for ADF to enable managed private endpoints
Microsoft.DataFactory/factories/integrationRuntimes ManagedVNetIR Managed VNet integration runtime — all data movement stays on Azure backbone
Microsoft.DataFactory/factories/managedVirtualNetworks/managedPrivateEndpoints mpe-sql Managed private endpoint from ADF to SQL Database
Microsoft.Authorization/roleAssignments adf-storage-contributor Storage Blob Data Contributor role for ADF managed identity

CC-INT-DP-002

Configure Synapse Workspace with ADLS Gen2 default data lake using managed identity and managed private endpoints

Severity: Required
Rationale: Synapse requires a default data lake for workspace artifacts; managed identity eliminates storage account keys in configuration
Agents: cloud-architect, terraform-agent, bicep-agent

Targets

  • Microsoft.DataFactory/factories
  • Microsoft.Synapse/workspaces
  • Microsoft.Databricks/workspaces
  • Microsoft.Storage/storageAccounts
  • Microsoft.Sql/servers

Companion Resources

Resource Name Purpose
Microsoft.Storage/storageAccounts synapse-data-lake ADLS Gen2 storage account (isHnsEnabled: true) for Synapse default data lake
Microsoft.Authorization/roleAssignments synapse-storage-contributor Storage Blob Data Contributor role for Synapse managed identity on data lake
Microsoft.Synapse/workspaces/managedVirtualNetworks/managedPrivateEndpoints mpe-datalake-dfs Managed private endpoint for Synapse to access data lake DFS endpoint

CC-INT-DP-003

Configure Databricks workspace with Key Vault-backed secret scope for secure credential access

Severity: Required
Rationale: Databricks secret scopes backed by Key Vault centralize secret management; Azure-managed scopes lack audit trail and rotation
Agents: cloud-architect, terraform-agent, bicep-agent, app-developer, csharp-developer, python-developer

Targets

  • Microsoft.DataFactory/factories
  • Microsoft.Synapse/workspaces
  • Microsoft.Databricks/workspaces
  • Microsoft.Storage/storageAccounts
  • Microsoft.Sql/servers

Companion Resources

Resource Name Purpose
Microsoft.KeyVault/vaults dbr-key-vault Key Vault with RBAC authorization for Databricks secret scope backing store
Microsoft.Network/privateEndpoints pe-dbr-kv Private endpoint for Key Vault access from Databricks VNet
Microsoft.Authorization/roleAssignments dbr-kv-secrets-user Key Vault Secrets User role (4633458b) for Databricks workspace identity

CC-INT-DP-004

Enforce encryption in transit for all cross-service data movement using private endpoints and TLS 1.2+

Severity: Required
Rationale: Data in transit between services must be encrypted and routed privately to prevent interception and exfiltration
Agents: cloud-architect, terraform-agent, bicep-agent, security-reviewer

Targets

  • Microsoft.DataFactory/factories
  • Microsoft.Synapse/workspaces
  • Microsoft.Databricks/workspaces
  • Microsoft.Storage/storageAccounts
  • Microsoft.Sql/servers

Companion Resources

Resource Name Purpose
Microsoft.Network/privateEndpoints pe-sql Private endpoint for SQL Server with groupId 'sqlServer'
Microsoft.Network/privateDnsZones privatelink.database.windows.net Private DNS zone for SQL Server private endpoint resolution

⚠️ **GitHub.com Fallback** ⚠️