Guzzle Security - ja-guzzle/guzzle_docs GitHub Wiki

Guzzle VM

SSH Access

  • SSH authentication to Guzzle VM can be enabled either using password or using SSH public key. SSH public key authentication is considered to be more secure compared to authentication using password. Only administrators should login with the default user created by azure VM (which usually has sudo permissions)
  • We can create new linux user guzzle (without sudo permissions) by which we can start/stop guzzle services, view guzzle log files etc. All non-admin users can use this to SSH into Guzzle VM

Network Configuration

  • Example Setup 1:
    Guzzle VM must be accessible from only selected ip ranges like Azure ADF service ip ranges, white-labeled on premise ip ranges etc
  • Example Setup 2:
    All inbound network rules are blocked for guzzle VM. Guzzle VM can be accessible only from bastion machine running in same VNet. Access to bastion machine can be restricted to white-labeled on premise ip ranges etc. For azure data factory access to guzzle - ADF integration runtime must be running in a windows machine in same VNet as guzzle VM
  • Once guzzle installation is done, add outbound rule to disable all outbound requests other than required ones

Mounting blob storage account for guzzle

  • We are using Blobfuse to mount blob storage on Guzzle VM. The config file for this application will contain access key of guzzle blob storage account in plain text. This config file must be created as root user and other users must not have access to it. Mounting of guzzle blob storage must be done via root user only. Also when Guzzle VM restarts - automatic mounting of guzzle blob storage must be done through crontab entry of the root user
  • Regular linux user will not be able to view fuse config file. Although they can delete all files in mounted guzzle blob storage directory

Guzzle database authentication

Multiple authentication mechanisms are supported when Azure SQL is used as Guzzle repository database:

  • username/password: This is common authentication mechanism where we specify database username and password as credentials
  • Azure AD: Use Azure AD username and password as credentials for authentication
  • Azure Service Principal: Use service principal credentials for authentication

Secret tokens in guzzle configurations

  • Guzzle passphrase file ($GUZZLE_HOME/passphrase) has secret passphrase which will be used to encrypt/decrypt string values like JDBC password, service principal secret, databricks workspace authentication token etc
  • JWT secret key in the application.yml of Guzzle API application is used to sign authentication token. This signed authentication token is used to call guzzle REST APIs
  • Both of this config files are in blob storage and accessible to all users

Enable SSL

Guzzle API and Web applications must be running on SSL so that all the traffic from client (browser, azure data factory etc) to server (applications running on guzzle VM) is secured

Enable single sign-on

We can enable single sign-on to authenticate users in guzzle using their Azure AD credentials

Guzzle application admin password

Once guzzle is installed it will use default username and password for admin user. This admin user password must be updated to more complex password or remove default admin user if single sign-on is enabled

Databricks workspace

VNet deployment

Databricks workspace must be deployed in VNet. Once guzzle installation is done, add outbound rule to disable all outbound requests other than required ones

Mounting storage accounts

When mounting storage accounts on databricks workspaces using notebook, create secret scopes first for access token, service principal secrets etc and use them in the notebook

External hive metastore password

If external hive metastore is used, use databricks secret for database password and use that secret in external hive metastore configuration of databricks cluster

Sql Server

Sql server databases must be accessible only within virtual network either using service endpoint or using private endpoint

Service endpoint:
1. enable service endpoints in guzzlevm, private-databricks, public-databricks subnets
2. add these subnets in sql server firewall rules
3. outbound firewall rule for 'Sql' destination is required for databricks and virtual machine nsg

Private endpoint:
1. add private endpoint in sql server to a subnet
2. no need to add outbound firewall rule for 'Sql' destination for databricks or virtual machine nsg
3. vnet internal outbound rule is required

Storage Account

Azure storage accounts must be accessible only within virtual network either using service endpoint or using private endpoint. Follow the steps similar to Sql Server service/private endpoint (not verified)

⚠️ **GitHub.com Fallback** ⚠️