HDFS "Data at Rest" encryption - stanislawbartkowski/wikis GitHub Wiki

HDFS encryption

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/configuring-hdfs-encryption/content/hdfs_data_at_rest_encryption.html

HDFS encryption is one of the safety features of HDP/Hadoop. "Wire encryption" protects data "in motion", HDFS encryption protects data "at rest". It allows HDP/Hadoop to be deployed in an environment where vulnerable data are analyzed, stored and processed. That's important to underline, HDFS does not have access to decoded data, HDFS is only managing the data. The only authorized user can see unencrypted data.

But on the other hand, HDFS encryption puts another layer of complexity and should be managed carefully. The model is very flexible because not all data requires such a high level of protection. HDFS allows designing a "safety zone" where sensible data are encrypted and "standard zones" where data are stored in a normal way.

Install Ranger KMS

Installation

Ranger Key Management System (KMS) is an extension to Ranger and allows administering cryptographic keys used for data encryption and decryption.

Installing Ranger KMS using Ambari Wizard is very straightforward.

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/installing-ranger-kms/content/installing_the_ranger_key_management_service.html

Ranger KMS Web UI

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/configuring-hdfs-encryption/content/using_the_ranger_key_management_service.html

Ranger KMS Web UI is accessed using the same URL as Ranger Web UI (<ranger_admin_host>:6080). The only difference is the keyadmin user name. After successful authentication, the user is forwarded to KMS specific screen panel. The default password for keyadmin user is keyadmin but it is not always working as expected. For Ranger KMS Web UI authentication, the Ranger authentication method is used. It is available from Ranger panel: Ranger->Configs->Advanced->Ranger Setting.

If AD authentication method is used, the keyadmin user should be created in Active Directory tree and use AD password to authenticate in Ranger KMS UI.

If Ranger authentication is Unix, keyadmin user should be created as local Linux user on the node where "Ranger User Sync" service is running. Also, /etc/shadow file should be readable by ranger user.

chmod 444 /etc/shadow

While authenticating in Ranger KMS UI, use keyadmin Linux password.

Test

Create a secure key

Create "sefezone" key.

Create HDFS safe zone

hdfs dfs -mkdir /zone_encr hdfs crypto -createZone -keyName safezone -path /zone_encr hdfs crypto -listZones

zone_encr  safezone 

Create test users

User Group Expected privileges
user2 dataadmin Can read and write in the encrypted zone
user3 datascience Can only read data from encrypted zone
user1 no group Malicious user, the encrypted zone is hidden from him

Grant HDFS privileges

Using Ranger Web UI, give dataadmin group full HDFS access to /zone_encr, datascience read only access. As a test, also malicious user1 user is given full HDFS access to /zone_encr

Create encryption policy

Using Ranger KMS Web UI, prepare policy for dataadmin and datascience group. Malicious user1 user is not included here. Because datascience group is supposed to read data only, it is enough to grant Decrypt EEK privilege.

As user2

Create and read data in the secure zone.

echo "Top secret content" >secret.txt hdfs dfs -copyFromLocal secret.txt /zone_encr hdfs dfs -cat /zone_encr/secret.txt

Top secret content

As user3

Can read data but cannot modify any data in a secure zone. It is protected by HDFS policy.

hdfs dfs -cat /zone_encr/secret.txt

Top secret content

hdfs dfs -rm /zone_encr/secret.txt

rm: Failed to move to trash: hdfs://mdp1.sb.com:8020/zone_encr/secret.txt: Permission denied: user=user3, access=WRITE, inode="/zone_encr":hdfs:hdfs:drwxr-xr-x

echo "Next secret" > secret1 hdfs dfs -copyFromLocal secret1 /zone_encr

copyFromLocal: Permission denied: user=user3, access=WRITE, inode="/zone_encr":hdfs:hdfs:drwxr-xr-x

As user1

Because user1 has full HDFS privileges in /zone_encr, without encryption it is eligible to read and update data in this directory. But the data has additional protection layer.

hdfs dfs -cat /zone_encr/secret.txt

cat: User:user1 not allowed to do 'DECRYPT_EEK' on 'safezone'

hdfs dfs -copyToLocal /zone_encr/secret.txt

copyToLocal: User:user1 not allowed to do 'DECRYPT_EEK' on 'safezone'