Guzzle on Azure Databricks - ja-guzzle/guzzle_docs GitHub Wiki
- Overview
- Azure resources setup
- Guzzle Setup
- Azure resources setup
- 1. Create Azure Blob storage to host the
- 2. Create Azure ADLSv2 to store the target tables
- 3. Crate Databricks Workspace
- 4. Azure SQL Server DB and SQL Server Instance
- 5. VM To run Guzzle API and UI application
- Guzzle Setup
- 1. Fuse for mounting blob to Oracle
- 2. Download Guzzle release bundle
- 3. Guzzle configuration changes
- 4. Test sample guzzle job
- Azure Blob storage to host the Guzzle home
- Azure ADLSv2 to store the target tables
- Databricks Workspace and mount Guzzle home
- Azure SQL Server DB and SQL Server Instance
- VM To run Guzzle API and UI application
- Download Guzzle release bundle
- Download JDBC Driver for SQL Server
- Fuse for mounting blob to Oracle
- Guzzle configuration changes
- Test sample guzzle job
Storage Account Name: testguzzleblob
Create the container to store the Guzzle home in this storage account:
a. Go to stroage account and click on Blobs
b. Enter the container name guzzlehome and press ok
c. Retrieve the access key for storage account testguzzleblob
Storage Account Name: testguzzleadlsv2
Click on Next to got Advanced Settings and Enable "Hierarchical namespace"
The click Review and Crate
Create the container to store the data in this storage account: a. Go to storage account testguzzleadlsv2 and click on "Data Lake Gen2 file systems")
b. Enter the file system name data and press ok
c. Retrieve the access key for storage account testguzzleadlsv2. This is required for DB Workspace cluster creation:
a. Create DB Workspace
Enter all the details as highlighted below:
b. Go to "testguzzle" workspace, "Launch Workspace" and Create the "test" cluster with below settings:
Go to Advance settings and place following:
For Spark tab, put the Env Variabel and the Access key for testguzzleadlsv2 (look at step 2.c. to get the info)
fs.azure.account.key.testguzzleadlsv2.dfs.core.windows.net qdYM4RrzuhPTx9AQ+vuRQO9+o3xOmmka/9cWuVUOwC+SBCA16hSF8H/xwemEIEvEGbshajm7Nt4Q1dfahzRoTQ==
PYSPARK_PYTHON=/databricks/python3/bin/python3
GUZZLE_HOME=/dbfs/mnt/guzzle
c. Click on Create Cluster. It will take upto 1 min to create cluster
d. Launch Notebook and create following cells:
dbutils.fs.mount(
source = "wasbs://[email protected]/",
mountPoint = "/mnt/guzzle",
extraConfigs = Map("fs.azure.account.key.testguzzleblob.blob.core.windows.net" -> "z7nxS6O2WVGeHfjMRvjVYeQ3P5t+qBSiA1r2gI0cOP3JiLQl03mX27ZPvuZBBJZOabNwUwfDSlE03uQeukgq9Q=="))
Test the mounted directory is working:
%sh
ls -ltrh /dbfs/mnt/guzzle
Create the database to store the target tables. The storage of this shall be ADLSv2 (the keys are added on spark config in the step 3.b above)
%sql
create database demo location 'abfss://[email protected]/hive/demo';
create table t1(i int);
insert into t1 values(1);
Create init script file which is required for Databricks to correctly capture guzzle logs for the driver program:
%sh
echo "log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.logger.com.justanalytics=INFO, RollingAppender
" > /dbfs/guzzle-log4j.properties
mkdir -p /dbfs/databricks/initscript
echo "#!/bin/bash
cat /dbfs/guzzle-log4j.properties >> /databricks/spark/dbconf/log4j/driver/log4j.properties
cat /dbfs/guzzle-log4j.properties >> /databricks/spark/dbconf/log4j/executor/log4j.properties
" > /dbfs/databricks/initscript/init.sh
Update the cluster Init Scripts to to incude the init script generated using the notebook above: The script has to be set to: dbfs:/databricks/initscript/init.sh The cluster has to be restarted post taht
Db name: testguzzledb Server: testguzzledbserver (the full host name shall be: testguzzledbserver .database.windows.net) User: demo Password: Admin@123
a. Create VM VM Name: testguzzlevm Hostname: testguzzlevm(the full host name shall be: testguzzlevm.eastasia.cloudapp.azure.com) User: demo Password: Admin@123456
b. Configure the domain name for the VM:
c. Enable Network access (opening all the ports to be accessible from JA VM aka JA public ip)
a. Login to guzzle vm with demo user account Host: testguzzlevm.eastasia.cloudapp.azure.com User: demo Password: Admin@123456
When prompted for password, enter the demo account password above
b. Install the fuse package
sudo rpm -Uvh https://packages.microsoft.com/config/rhel/7/packages-microsoft-prod.rpm
sudo sudo yum install blobfuse fuse -y
echo "user_allow_other" | sudo tee -a /etc/fuse.conf
sudo mkdir /mnt/blobfusetmp
sudo chown <username> /mnt/blobfusetmp
c. Mount the guzzle home
echo "accountName testguzzleblob
accountKey z7nxS6O2WVGeHfjMRvjVYeQ3P5t+qBSiA1r2gI0cOP3JiLQl03mX27ZPvuZBBJZOabNwUwfDSlE03uQeukgq9Q==
containerName guzzlehome" > /home/demo/fuse_connection.cfg
chmod 777 /home/demo/fuse_connection.cfg
sudo mkdir /guzzle
sudo chmod 777 /guzzle
sudo -H -u demo bash -c "blobfuse /guzzle --tmp-path=/mnt/blobfusetmp -o allow_other --config-file=/home/demo/fuse_connection.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --file-cache-timeout-in-seconds=10"
cd /guzzle
echo "test" >a.log
ls -ltrh
a. Login to VM b. Download the package using the commands below:
cd guzzle
wget -q https://guzzlesa.blob.core.windows.net/guzzle-release/guzzle-0.7.34.tar.gz
tar xzf guzzle-0.6.7.tar.gz --strip-components 1
cd libs
wget -O mssql-jdbc-6.1.0.jre8.jar https://guzzlesa.blob.core.windows.net/guzzle-release/mssql-jdbc-6.1.0.jre8.jar?sv=2018-03-28&ss=bqtf&srt=sco&sp=rwdlacup&se=2019-05-22T12:48:38Z&sig=gUNYA948mxn5rXzZG2yyv6yLdXppXmZpkPpy7jO9%2Bb8%3D&_=1558500986434
mkdir ../api/libs
cp mssql-jdbc-6.1.0.jre8.jar ../api/libs
- Login to vm
Export guzzle_home
export GUZZLE_HOME=/guzzle
### 4. Test sample guzzle job