Deployment - RogerThattt/SAP-Retail-Data-Databricks GitHub Wiki

๐Ÿงฑ Folder Structure Recap (inside retail_dlt_pipeline/) arduino Copy Edit retail_dlt_pipeline/ โ”œโ”€โ”€ bronze/ โ”œโ”€โ”€ silver/ โ”œโ”€โ”€ gold/ โ”œโ”€โ”€ config/ โ”œโ”€โ”€ tests/ โ”œโ”€โ”€ README.md โ”œโ”€โ”€ .github/ โ”‚ โ””โ”€โ”€ workflows/ โ”‚ โ””โ”€โ”€ dlt-ci-cd.yml โ† CI/CD workflow โœ… Requirements

  1. Databricks PAT (Personal Access Token) Go to User Settings โ†’ Developer โ†’ Generate a new token.

Store it as a GitHub secret:

DATABRICKS_TOKEN

DATABRICKS_HOST (e.g., https://.databricks.com)

๐Ÿ“„ .github/workflows/dlt-ci-cd.yml yaml Copy Edit name: DLT CI/CD

on: push: branches: [ main ] pull_request: branches: [ main ]

env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }} DATABRICKS_REPO_PATH: /Repos/[email protected]/retail_dlt_pipeline

jobs: test: runs-on: ubuntu-latest steps: - name: Checkout repo uses: actions/checkout@v3

- name: Set up Python
  uses: actions/setup-python@v4
  with:
    python-version: '3.10'

- name: Install dependencies
  run: |
    pip install databricks-cli pytest

- name: Run unit tests (dry-run)
  run: pytest retail_dlt_pipeline/tests --maxfail=1 --disable-warnings

deploy: needs: test runs-on: ubuntu-latest steps: - name: Checkout repo uses: actions/checkout@v3

- name: Install Databricks CLI
  run: pip install databricks-cli

- name: Configure CLI
  run: |
    databricks configure --token <<EOF
    $DATABRICKS_HOST
    $DATABRICKS_TOKEN
    EOF
  env:
    DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
    DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}

- name: Push to Databricks Repos
  run: |
    databricks repos update --path "$DATABRICKS_REPO_PATH" --branch main || \
    databricks repos create --url https://github.com/${{ github.repository }} --provider gitHub --path "$DATABRICKS_REPO_PATH"

๐Ÿงช Optional: Add pytest-compatible tests Inside tests/test_data_quality.py, wrap test functions for local validation:

python Copy Edit def test_schema_integrity(spark): df = spark.read.option("header", True).csv("tests/test_sample.csv") assert "material_id" in df.columns ๐Ÿงฌ Optional: Trigger DLT Pipeline via API Add a final job to the workflow to call the DLT REST API:

yaml Copy Edit