Create following databases in sql server:
hive_metastore - to be used as external hive metastore
guzzle - to be used as guzzle repository where job audits, batch records, recon/dq outputs etc will be stored
guzzle_api_db - database for guzzle api server
Create databricks workspace and follow steps in https://docs.azuredatabricks.net/user-guide/advanced/external-hive-metastore.html to setup external hive metastore for databricks cluster

spark.sql.hive.metastore.version 1.2.1
spark.sql.hive.metastore.jars builtin
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://<sqlserver-host>;database=guzzle;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30
spark.hadoop.javax.jdo.option.ConnectionUserName <database-username>
spark.hadoop.javax.jdo.option.ConnectionPassword <password>
spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver
datanucleus.autoCreateSchema true
datanucleus.fixedDatastore false

Create azure blob storage account
Create container guzzlehome where files related to guzzle home (configs/binaries/libraries) will be stored
In databricks cluster spark config add following configuration:

fs.azure.account.key.<storage-account-name>.blob.core.windows.net <storage-account-key>

Upload guzzle home to the guzzlehome container
Generate api authentication token as per mentioned in https://docs.azuredatabricks.net/api/latest/authentication.html
Mount guzzlehome container into dbfs by following steps in https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html#mount-azure-blob-storage

dbutils.fs.mount(
  source = "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = Map("<conf-key>" -> dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")))

Set environment variable GUZZLE_HOME=/dbfs/<directory where guzzlehome container is mounted> in spark cluster
Create guzzle-log4j.properties in /dbfs directory with following content:

log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.logger.com.justanalytics=INFO, RollingAppender

Write init script to append content of /dbfs/guzzle-log4j.properties to /databricks/spark/dbconf/log4j/driver/log4j.properties
Install guzzle-azure-databricks utility jar on the cluster
Restart cluster

Prepare guzzle home

In guzzle.yml set database, spark and guzzle configs as following:

database:
  type: jdbc
  properties:
    jdbc_url: jdbc:sqlserver://<sqlserver-host>;database=guzzle;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30
    username: <database-username>
    password: <password>
...
spark:
  run_mode: azure-databricks
  properties:
    api_url: https://<region>.azuredatabricks.net
    auth_token: <api-token>
    cluster_id: <databricks-cluster-id>
    dbfs_guzzle_dir: dbfs:/<directory where guzzlehome container is mounted on dbfs>
...

Prepare virtual machine for deploying guzzle applications

Mount guzzlehome container into linux file system using following steps:

wget https://packages.microsoft.com/config/ubuntu/16.04/packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo apt-get update
sudo apt-get -y install blobfuse
echo "user_allow_other" | sudo tee -a /etc/fuse.conf
sudo mkdir /mnt/blobfusetmp
sudo chown <username> /mnt/blobfusetmp
echo "accountName <account-name>
accountKey <account-key>
containerName guzzlehome" > /home/<username>/fuse_connection.cfg
chmod 777 /home/<username>/fuse_connection.cfg
sudo mkdir /guzzle
sudo chmod 777 /guzzle
sudo -H -u <username> bash -c "blobfuse /guzzle --tmp-path=/mnt/blobfusetmp -o allow_other --config-file=/home/<username>/fuse_connection.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --file-cache-timeout-in-seconds=10"

Run guzzle database initializer to generate raw schema content for guzzle database using following command:

java -cp /guzzle/libs/*:/guzzle/libs:/guzzle/bin/common.jar com.justanalytics.guzzle.common.DatabaseInitializer generate

Modify raw schema content as per sql server syntax and other necessary changes and execute in guzzle database
Follow steps in https://github.com/ja-guzzle/docs/wikis/design-/Guzzle-UI-deployment-runbook to deploy api and ui applications

Databricks notebook commands:

To create database in adls:

create database demo location 'adl://guzzletest.azuredatalakestore.net/hive-data/demodb';
create table demo.users ( id int, first_name string, last_name string, age decimal(2,0), created_time timestamp) partitioned by (instance_id bigint, system string, location string) stored as parquet;

Init guzzle utility from databricks notebook:

import com.justanalytics.guzzle.util.databricks.azure.GuzzleUtils
val guzzle = new GuzzleUtils(<api-url>, <api-username>, dbutils.secrets.get(scope = "demoscope", key = <api-password>), <cluster-name>)

Run guzzle job using guzzle databricks utility:

guzzle.runJob(<job-config-name>, <environment-name>, Map(<job-params-including-business-date>))

Azure Databricks Guzzle Setup - ja-guzzle/guzzle_docs GitHub Wiki

Prepare guzzle home

Prepare virtual machine for deploying guzzle applications

Databricks notebook commands:

⚠️ GitHub.com Fallback ⚠️

Azure Databricks Guzzle Setup - ja-guzzle/guzzle_docs GitHub Wiki

Prepare guzzle home

Prepare virtual machine for deploying guzzle applications

Databricks notebook commands:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️