Guzzle Security - ja-guzzle/guzzle_docs GitHub Wiki
- SSH authentication to Guzzle VM can be enabled either using password or using SSH public key. SSH public key authentication is considered to be more secure compared to authentication using password. Only administrators should login with the default user created by azure VM (which usually has sudo permissions)
- We can create new linux user guzzle (without sudo permissions) by which we can start/stop guzzle services, view guzzle log files etc. All non-admin users can use this to SSH into Guzzle VM
-
Example Setup 1:
Guzzle VM must be accessible from only selected ip ranges like Azure ADF service ip ranges, white-labeled on premise ip ranges etc -
Example Setup 2:
All inbound network rules are blocked for guzzle VM. Guzzle VM can be accessible only from bastion machine running in same VNet. Access to bastion machine can be restricted to white-labeled on premise ip ranges etc. For azure data factory access to guzzle - ADF integration runtime must be running in a windows machine in same VNet as guzzle VM - Once guzzle installation is done, add outbound rule to disable all outbound requests other than required ones
- We are using Blobfuse to mount blob storage on Guzzle VM. The config file for this application will contain access key of guzzle blob storage account in plain text. This config file must be created as root user and other users must not have access to it. Mounting of guzzle blob storage must be done via root user only. Also when Guzzle VM restarts - automatic mounting of guzzle blob storage must be done through crontab entry of the root user
- Regular linux user will not be able to view fuse config file. Although they can delete all files in mounted guzzle blob storage directory
Multiple authentication mechanisms are supported when Azure SQL is used as Guzzle repository database:
- username/password: This is common authentication mechanism where we specify database username and password as credentials
- Azure AD: Use Azure AD username and password as credentials for authentication
- Azure Service Principal: Use service principal credentials for authentication
- Guzzle passphrase file ($GUZZLE_HOME/passphrase) has secret passphrase which will be used to encrypt/decrypt string values like JDBC password, service principal secret, databricks workspace authentication token etc
- JWT secret key in the application.yml of Guzzle API application is used to sign authentication token. This signed authentication token is used to call guzzle REST APIs
- Both of this config files are in blob storage and accessible to all users
Guzzle API and Web applications must be running on SSL so that all the traffic from client (browser, azure data factory etc) to server (applications running on guzzle VM) is secured
We can enable single sign-on to authenticate users in guzzle using their Azure AD credentials
Once guzzle is installed it will use default username and password for admin user. This admin user password must be updated to more complex password or remove default admin user if single sign-on is enabled
Databricks workspace must be deployed in VNet. Once guzzle installation is done, add outbound rule to disable all outbound requests other than required ones
When mounting storage accounts on databricks workspaces using notebook, create secret scopes first for access token, service principal secrets etc and use them in the notebook
If external hive metastore is used, use databricks secret for database password and use that secret in external hive metastore configuration of databricks cluster
Sql server databases must be accessible only within virtual network either using service endpoint or using private endpoint
Service endpoint:
1. enable service endpoints in guzzlevm, private-databricks, public-databricks subnets
2. add these subnets in sql server firewall rules
3. outbound firewall rule for 'Sql' destination is required for databricks and virtual machine nsg
Private endpoint:
1. add private endpoint in sql server to a subnet
2. no need to add outbound firewall rule for 'Sql' destination for databricks or virtual machine nsg
3. vnet internal outbound rule is required
Azure storage accounts must be accessible only within virtual network either using service endpoint or using private endpoint. Follow the steps similar to Sql Server service/private endpoint (not verified)