HDFS "Data at Rest" encryption - stanislawbartkowski/wikis GitHub Wiki
HDFS encryption
HDFS encryption is one of the safety features of HDP/Hadoop. "Wire encryption" protects data "in motion", HDFS encryption protects data "at rest". It allows HDP/Hadoop to be deployed in an environment where vulnerable data are analyzed, stored and processed. That's important to underline, HDFS does not have access to decoded data, HDFS is only managing the data. The only authorized user can see unencrypted data.
But on the other hand, HDFS encryption puts another layer of complexity and should be managed carefully. The model is very flexible because not all data requires such a high level of protection. HDFS allows designing a "safety zone" where sensible data are encrypted and "standard zones" where data are stored in a normal way.
Install Ranger KMS
Installation
Ranger Key Management System (KMS) is an extension to Ranger and allows administering cryptographic keys used for data encryption and decryption.
Installing Ranger KMS using Ambari Wizard is very straightforward.
Ranger KMS Web UI
Ranger KMS Web UI is accessed using the same URL as Ranger Web UI (<ranger_admin_host>:6080). The only difference is the keyadmin user name. After successful authentication, the user is forwarded to KMS specific screen panel. The default password for keyadmin user is keyadmin but it is not always working as expected. For Ranger KMS Web UI authentication, the Ranger authentication method is used. It is available from Ranger panel: Ranger->Configs->Advanced->Ranger Setting.
If AD authentication method is used, the keyadmin user should be created in Active Directory tree and use AD password to authenticate in Ranger KMS UI.
If Ranger authentication is Unix, keyadmin user should be created as local Linux user on the node where "Ranger User Sync" service is running. Also, /etc/shadow file should be readable by ranger user.
chmod 444 /etc/shadow
While authenticating in Ranger KMS UI, use keyadmin Linux password.
Test
Create a secure key
Create "sefezone" key.
Create HDFS safe zone
hdfs dfs -mkdir /zone_encr hdfs crypto -createZone -keyName safezone -path /zone_encr hdfs crypto -listZones
zone_encr safezone
Create test users
User | Group | Expected privileges |
---|---|---|
user2 | dataadmin | Can read and write in the encrypted zone |
user3 | datascience | Can only read data from encrypted zone |
user1 | no group | Malicious user, the encrypted zone is hidden from him |
Grant HDFS privileges
Using Ranger Web UI, give dataadmin group full HDFS access to /zone_encr, datascience read only access. As a test, also malicious user1 user is given full HDFS access to /zone_encr
Create encryption policy
Using Ranger KMS Web UI, prepare policy for dataadmin and datascience group. Malicious user1 user is not included here.
Because datascience group is supposed to read data only, it is enough to grant Decrypt EEK privilege.
As user2
Create and read data in the secure zone.
echo "Top secret content" >secret.txt hdfs dfs -copyFromLocal secret.txt /zone_encr hdfs dfs -cat /zone_encr/secret.txt
Top secret content
As user3
Can read data but cannot modify any data in a secure zone. It is protected by HDFS policy.
hdfs dfs -cat /zone_encr/secret.txt
Top secret content
hdfs dfs -rm /zone_encr/secret.txt
rm: Failed to move to trash: hdfs://mdp1.sb.com:8020/zone_encr/secret.txt: Permission denied: user=user3, access=WRITE, inode="/zone_encr":hdfs:hdfs:drwxr-xr-x
echo "Next secret" > secret1 hdfs dfs -copyFromLocal secret1 /zone_encr
copyFromLocal: Permission denied: user=user3, access=WRITE, inode="/zone_encr":hdfs:hdfs:drwxr-xr-x
As user1
Because user1 has full HDFS privileges in /zone_encr, without encryption it is eligible to read and update data in this directory. But the data has additional protection layer.
hdfs dfs -cat /zone_encr/secret.txt
cat: User:user1 not allowed to do 'DECRYPT_EEK' on 'safezone'
hdfs dfs -copyToLocal /zone_encr/secret.txt
copyToLocal: User:user1 not allowed to do 'DECRYPT_EEK' on 'safezone'