EMR 006 Enable SSL for Hive Server2 with LDAP Authentication - qyjohn/AWS_Tutorials GitHub Wiki

The following procedure has been fully tested on an EMR cluster (5.9.0) with Hive 2.3.0. All commands are executed on the master node. It should be noted that the EMR cluster should not use kerberos authentication.

(1) Create a folder to store the keys

sudo mkdir /keys
sudo chown -R hadoop:hadoop /keys
cd /keys

(2) Generate certificates. In this procedure, we use "hadoop" whenever being prompted for a password. You can of course use your own password.

In the following command, when prompted for your name, enter the value of $host (which is the DNS name of the EMR master, for example, ip-172-32-1-69) as the name.

host="ip-172-32-1-69" # This is only an example!
keytool -genkey -alias $host -keyalg RSA -keystore keystore.jks -keysize 2048

List the certificates in your key store:

keytool -list -keystore keystore.jks

Export the certificate:

keytool -export -alias $host -file $host.crt -keystore keystore.jks

Add this certificate to the trust store, which will be used by the client:

keytool -import -trustcacerts -alias $host -file $host.crt -keystore truststore.jks

List the certificates in your trust store:

keytool -list -keystore truststore.jks

(3) Modify /etc/hive/conf/hive-site.xml, add the following properties:

<property>
  <name>hive.server2.use.SSL</name>
  <value>true</value>
  <description>enable/disable SSL </description>
</property>
 
<property>
  <name>hive.server2.keystore.path</name>
  <value>/keys/keystore.jks</value>
  <description>path to keystore file</description>
</property>

<property>
  <name>hive.server2.keystore.password</name>
  <value>hadoop</value>
  <description>keystore password</description>
</property>

Restart hive-server2:

[hadoop@ip-172-32-1-69 keys]$ sudo stop hive-server2; sudo start hive-server2
hive-server2 stop/waiting
hive-server2 start/running, process 15299

(4) Test connection with Beeline

[hadoop@ip-172-32-1-69 keys]$ beeline
Beeline version 2.3.0-amzn-0 by Apache Hive
beeline> !connect jdbc:hive2://ip-172-32-1-69:10000/default;ssl=true;sslTrustStore=/keys/truststore.jks;trustStorePassword=hadoop
Connecting to jdbc:hive2://ip-172-32-1-69:10000/default;ssl=true;sslTrustStore=/keys/truststore.jks;trustStorePassword=hadoop
Enter username for jdbc:hive2://ip-172-32-1-69:10000/default: hive
Enter password for jdbc:hive2://ip-172-32-1-69:10000/default: 
Connected to: Apache Hive (version 2.3.0-amzn-0)
Driver: Hive JDBC (version 2.3.0-amzn-0)
Transaction isolation: TRANSACTION_REPEATABLE_READ

As you can see, you are now accessing Hive over JDBC (using Beeline) with SSL enabled.

(5) Test connection with SQL Workbench

In SQL Workbench, setup the Hive Server2 JDBC Driver, as described in the following AWS documentation:

Use the Hive JDBC Driver

From the EMR master node, copy /keys/truststore.jks to the Windows instance running SQL Workbench as C:\truststore.jks.

In SQL Workbench, use the following JDBC connection URL:

jdbc:hive2://ip-172-32-1-69:10000/default;ssl=1;sslTrustStore=C:\truststore.jks;trustStorePassword=hadoop

As you can see, you are now accessing Hive over JDBC (using SQL Workbench) with SSL enabled.

(6) Setup FreeIPA as LDAP Server. Here we assume that you use ipa.example.com as the FreeIPA domain name. You will need to setup two private hosted zones to provide DNS resolution and reverse DNS resolution for ipa.example.com.

(7) Add the following configuration to /etc/hive/conf/hive-site.xml

<property>
  <name>hive.server2.authentication</name>
    <value>LDAP</value>
</property>
<property>
 <name>hive.server2.authentication.ldap.url</name>
   <value>ldap://ipa.example.com:389</value>
</property>
<property>
<name>hive.server2.authentication.ldap.baseDN</name>
   <value>cn=users,cn=accounts,dc=example,dc=com</value>
</property>

Restart hive-server2:

[hadoop@ip-172-32-1-69 keys]$ sudo stop hive-server2; sudo start hive-server2
hive-server2 stop/waiting
hive-server2 start/running, process 15299

Now you should be able to connect to Hive Server2 using your LDAP credentials.

[hadoop@ip-172-32-1-69 keys]$ beeline
Beeline version 2.3.0-amzn-0 by Apache Hive
beeline> !connect jdbc:hive2://ip-172-32-1-69:10000/default;ssl=true;sslTrustStore=/keys/truststore.jks;trustStorePassword=hadoop
Connecting to jdbc:hive2://ip-172-32-1-69:10000/default;ssl=true;sslTrustStore=/keys/truststore.jks;trustStorePassword=hadoop
Enter username for jdbc:hive2://ip-172-32-1-69:10000/default: ldapuser
Enter password for jdbc:hive2://ip-172-32-1-69:10000/default: ***********
Connected to: Apache Hive (version 2.3.0-amzn-0)
Driver: Hive JDBC (version 2.3.0-amzn-0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
⚠️ **GitHub.com Fallback** ⚠️