Hybrid Data Pipeline - ja-guzzle/guzzle_docs GitHub Wiki
- The agreement with Databricks will bundle three things
- Since its revenue sharing and OEM, they are ready to include additional component - the term used is "Super Bucket" which includes any future JDBC 1 Tier (direct connectors) are build and release
- They are trying to bring parity of between data sources they support on as 1-tier connectors and those bundled in HDP - only 1 to 2 candidates are not there
- ODBC and JDBC connectors again have huge overlap - all the new connectors are build on HDP
-
Data Passing in HDP:
- The data source credentials are protected by encryption. At rest, it is protected by AES256 and in transition, it is protected by TLS. Look at 2nd last page of the Security White Paper for more details.
-
Communication between client & server:
- Hybrid data pipeline installation generated redist files which are used to install the JDBC/ODBC/OnPremise connector. JDBC connector installation creates a Java KeyStore file inside <install_dir>/sslcertificates folder which can be used in JDBC URL to connect to the HDP server.
-
Authentication between server & OPC:
- A unique random AES256 encryption key is generated for each HDP single instance or HDP cluster installation.
- The key with an added salt value is encrypted using a master AES256 encryption key and placed in the OnPremise.properties file (one of the redist files mentioned above) as the AuthKey value.
- During the installation of the OPC, the user password provided to the OPC Installer is encrypted with salt using the unique HDP encryption key from the authKey value and then placed in the OnPremise.properties file as the Auth value.
- All user passwords are encrypted using SHA-256-bit one-way hash with per-user salt.
- Azure Application Gateway by default uses the Web Sockets for communication. And the client, as well as OPC, is able to connect to the HDP server without any issue.
- However, SSL connection between the load balancer(Application Gateway) and the Hybrid Data Pipeline nodes is currently not supported.
- Load Balancer is not able to handle node failure for inflight requests.
- However, for the next request, it will re-establish the connection to one of the online nodes and process the request successfully.
This includes the one that we have asked as well
- SAP Hana
- Dynamics
- Microsoft Access (this we need for
- Shall we also include ODBC and then use JDBC-ODBC bridge driver to connect using some of the sources which are JDBC only