Execution Plan - RogerThattt/SAP-Retail-Data-Databricks GitHub Wiki
Great โ hereโs a full step-by-step guide to implementing the retail data pipeline in a Databricks notebook using Delta Live Tables (DLT). This will include the Bronze โ Silver โ (Gold) layers. Youโll be able to run this end-to-end inside a Databricks workspace.
Ensure the following are ready before coding:
Requirement | Description |
---|---|
โ Workspace access | You have access to a Databricks workspace |
โ Unity Catalog enabled | Recommended for managing tables, shares, and schema |
โ Delta Sharing configured | Youโve already landed raw SAP data in Delta format (bronze tables) |
โ Cluster configured | DLT pipelines run on a managed compute (auto-configured when pipeline is created) |
-
Add SCD Type 2 logic to Silver (if product attributes change over time)
-
Load Data via Autoloader for real-time Bronze ingestion
-
Use Notebooks โ Python Modules for production modularization
-
Register validated Silver tables to Feature Store for ML models
-
Connect Gold to BI tools (PowerBI, Tableau) or GenAI semantic pipelines
Would you like me to scaffold a modular production-ready version (separate notebooks for each layer, with parameterization and tests), or help you validate the bronze ingestion process first?