Deployment - RogerThattt/SAP-Retail-Data-Databricks GitHub Wiki

🧱 Folder Structure Recap (inside retail_dlt_pipeline/) arduino Copy Edit retail_dlt_pipeline/ ├── bronze/ ├── silver/ ├── gold/ ├── config/ ├── tests/ ├── README.md ├── .github/ │ └── workflows/ │ └── dlt-ci-cd.yml ← CI/CD workflow ✅ Requirements

Databricks PAT (Personal Access Token) Go to User Settings → Developer → Generate a new token.

Store it as a GitHub secret:

DATABRICKS_TOKEN

DATABRICKS_HOST (e.g., https://.databricks.com)

📄 .github/workflows/dlt-ci-cd.yml yaml Copy Edit name: DLT CI/CD

on: push: branches: [ main ] pull_request: branches: [ main ]

env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }} DATABRICKS_REPO_PATH: /Repos/[email protected]/retail_dlt_pipeline

jobs: test: runs-on: ubuntu-latest steps: - name: Checkout repo uses: actions/checkout@v3

- name: Set up Python
  uses: actions/setup-python@v4
  with:
    python-version: '3.10'

- name: Install dependencies
  run: |
    pip install databricks-cli pytest

- name: Run unit tests (dry-run)
  run: pytest retail_dlt_pipeline/tests --maxfail=1 --disable-warnings

deploy: needs: test runs-on: ubuntu-latest steps: - name: Checkout repo uses: actions/checkout@v3

- name: Install Databricks CLI
  run: pip install databricks-cli

- name: Configure CLI
  run: |
    databricks configure --token <<EOF
    $DATABRICKS_HOST
    $DATABRICKS_TOKEN
    EOF
  env:
    DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
    DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}

- name: Push to Databricks Repos
  run: |
    databricks repos update --path "$DATABRICKS_REPO_PATH" --branch main || \
    databricks repos create --url https://github.com/${{ github.repository }} --provider gitHub --path "$DATABRICKS_REPO_PATH"

🧪 Optional: Add pytest-compatible tests Inside tests/test_data_quality.py, wrap test functions for local validation:

python Copy Edit def test_schema_integrity(spark): df = spark.read.option("header", True).csv("tests/test_sample.csv") assert "material_id" in df.columns 🧬 Optional: Trigger DLT Pipeline via API Add a final job to the workflow to call the DLT REST API:

yaml Copy Edit

name: Trigger DLT pipeline run: | curl -X POST https://$DATABRICKS_HOST/api/2.0/pipelines/trigger
-H "Authorization: Bearer $DATABRICKS_TOKEN"
-d '{"pipeline_id": "your-dlt-pipeline-id"}' You can get your DLT pipeline ID from the Databricks UI.