Copilot Summary - razmipatel/Random GitHub Wiki
Note: AI-generated summary. Please validate technical accuracy.
- Reviewed overall project scope and the need to resolve key technical questions before the workshop.
- Onboarding to AIBE is ongoing.
- Clear success criteria must be defined.
- Emphasis on removing blockers prior to engaging a wider stakeholder group.
- Azure-based architecture presented, including:
- VPN / ExpressRoute connectivity
- Azure Blob Storage for raw data
- Medallion architecture (Bronze / Silver / Gold)
- No public IPs assumed; all traffic routed over the Microsoft backbone.
- Network topology includes:
- Control plane, compute plane, data plane
- VNet injection
- Dedicated subnets for clusters and relay services
- Private Endpoints for Azure PaaS services
- Clarification required on the relay agent role in the host subnet.
- Authentication via Azure Private Link and Microsoft Entra ID.
- Confirmed that authentication tokens remain on Microsoft’s backbone.
- Proposed core user groups:
- Workspace Admins
- Data Scientists
- Data Engineers
- Explored how the Databricks control plane connects to customer VNets and Azure services.
- Open questions around:
- Azure Firewall routing
- UDR usage
- Whether logs / Hive Metastore require public IPs
- Databricks confirmed as a first-party Microsoft service; further validation on FQDN/IP requirements requested.
- Required services:
- Azure Data Lake Storage (ADLS)
- Azure Key Vault (AKV)
- Databricks clusters require AKV access for secrets.
- No requirement for ADF or other PaaS services at this stage.
-
Classic compute
- Control plane does not access ADLS directly.
- Compute plane accesses storage via Managed Identity.
-
Serverless compute
- Requires Private Link connectivity from Databricks cloud account.
- Planned for later phases.
- Preferred approach is ADLS for governance and simplicity.
- Copying data to Blob Storage is possible if required.
- MVP decision:
- Single container for Bronze / Silver / Gold layers.
- Option to separate later as scale increases.
- Databricks vs Entra ID service principals discussed.
- No limitations identified with Entra ID service principals.
- Least-privilege RBAC reviewed for:
- ADLS
- Key Vault
- Documentation links shared via chat.
- Question raised around custom OS images and installing security/monitoring agents.
- Databricks guidance:
- OS customisation not recommended.
- Native monitoring and compliance tooling available.
- Identified as a potential security blocker requiring AIB review.
- All infrastructure to be provisioned via Terraform, including:
- VNets
- Private Endpoints
- Databricks workspaces
- Databricks Terraform provider to be used.
- Application teams manage Databricks configuration post-deployment.
- SAT introduced to assess Databricks workspace security posture.
- Limitation:
- Does not assess broader Azure infrastructure.
- Additional observability required for AIB standards.
- Requirement for outbound connectivity from Databricks compute.
- Used for:
- Source control (Bitbucket)
- CI/CD (Cloudbase)
- Network specifics still to be confirmed.
- Traffic is predominantly outbound from AIB to Databricks.
- Minimal inbound traffic.
- Firewall traversal and routing remain under review.
- Must align with AIB network security patterns.
- Workshop will proceed with:
- Explicit assumptions documented
- Open items tracked for follow-up
- Outstanding questions to be resolved post-workshop.
-
Relay Agent / Host Subnet
- Is a VM or agent required for Secure Cluster Connectivity (SCC)?
- What is hosted in the host subnet?
-
Azure Firewall Traffic Routing
- Does traffic to the Databricks control plane stay on the Microsoft backbone?
- Are public IPs involved?
-
FQDN / IP Resolution
- How are Databricks control plane FQDNs resolved?
- Firewall rule implications?
-
Custom OS Images & Hardening
- Can AIB-mandated security agents be deployed?
- Are alternative controls acceptable?
-
Outbound Connectivity Patterns
- Any bidirectional or special outbound requirements?
- Bitbucket and Cloudbase specifics?
-
Serverless Compute Connectivity
- When is it required (Phase Zero vs later)?
- What network changes are needed?
-
Minimum RBAC Permissions
- Confirm least-privilege RBAC for storage and Key Vault.
-
Bitbucket & Cloudbase Network Path
- Exact connectivity paths and firewall requirements.
-
Databricks Control Plane → ADLS
- Any direct connectivity required, or compute-only access?
-
IAM Roles & Entra ID Groups
- Minimum required roles and group model.
These items were identified as potential blockers and require follow-up with Databricks and internal AIB stakeholders before finalising the architecture.