Deployment - RogerThattt/SAP-Retail-Data-Databricks GitHub Wiki
๐งฑ Folder Structure Recap (inside retail_dlt_pipeline/) arduino Copy Edit retail_dlt_pipeline/ โโโ bronze/ โโโ silver/ โโโ gold/ โโโ config/ โโโ tests/ โโโ README.md โโโ .github/ โ โโโ workflows/ โ โโโ dlt-ci-cd.yml โ CI/CD workflow โ Requirements
- Databricks PAT (Personal Access Token) Go to User Settings โ Developer โ Generate a new token.
Store it as a GitHub secret:
DATABRICKS_TOKEN
DATABRICKS_HOST (e.g., https://.databricks.com)
๐ .github/workflows/dlt-ci-cd.yml yaml Copy Edit name: DLT CI/CD
on: push: branches: [ main ] pull_request: branches: [ main ]
env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }} DATABRICKS_REPO_PATH: /Repos/[email protected]/retail_dlt_pipeline
jobs: test: runs-on: ubuntu-latest steps: - name: Checkout repo uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install databricks-cli pytest
- name: Run unit tests (dry-run)
run: pytest retail_dlt_pipeline/tests --maxfail=1 --disable-warnings
deploy: needs: test runs-on: ubuntu-latest steps: - name: Checkout repo uses: actions/checkout@v3
- name: Install Databricks CLI
run: pip install databricks-cli
- name: Configure CLI
run: |
databricks configure --token <<EOF
$DATABRICKS_HOST
$DATABRICKS_TOKEN
EOF
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
- name: Push to Databricks Repos
run: |
databricks repos update --path "$DATABRICKS_REPO_PATH" --branch main || \
databricks repos create --url https://github.com/${{ github.repository }} --provider gitHub --path "$DATABRICKS_REPO_PATH"
๐งช Optional: Add pytest-compatible tests Inside tests/test_data_quality.py, wrap test functions for local validation:
python Copy Edit def test_schema_integrity(spark): df = spark.read.option("header", True).csv("tests/test_sample.csv") assert "material_id" in df.columns ๐งฌ Optional: Trigger DLT Pipeline via API Add a final job to the workflow to call the DLT REST API:
yaml Copy Edit
- name: Trigger DLT pipeline
run: |
curl -X POST https://$DATABRICKS_HOST/api/2.0/pipelines/trigger
-H "Authorization: Bearer $DATABRICKS_TOKEN"
-d '{"pipeline_id": "your-dlt-pipeline-id"}' You can get your DLT pipeline ID from the Databricks UI.