HBase Ranger - stanislawbartkowski/hdpactivedirectory GitHub Wiki

HBase

Make sure that AD test users are prepared according to https://github.com/stanislawbartkowski/hdpactivedirectory/blob/master/README.md#ad-users-and-groups-used-for-testing.

User Group Role
user1 - Malicious user, access blocked
user2 dataadmin Data administrator, can read and modify the data
user3 datascience Data consumer, can read but cannot modify the data

Enable Ranger plugin for HBase.

Cloudera.

As hbase superuser, create HBase namespace datalake

In CPD cluster, hbase cannot be accessed directly. Create alternative hbase superuser. https://github.com/stanislawbartkowski/wikis/wiki/IBM-BigSQL-and-Cloudera#hbase

su - hbase
kinit
hbase shell

create_namespace 'datalake'

Test description

  • user2:dataadmin - can create and load data into HBase any table in datalake namespace
  • user3:datascience - is allowed only to read data in datalake namespace, cannot modify anything
  • user1:(no group) - is denied any access to datalake namespace.

Create Ranger policy

As HBase table enter datalake:*. The policy should be defined at the group level.

As user2

hbase shell

create 'datalake:testdata','cf1'

put 'datalake:testdata',1,'cf1:name','Hello'
put 'datalake:testdata',1,'cf1:number',1

put 'datalake:testdata',2,'cf1:name','Hello2'
put 'datalake:testdata',2,'cf1:number',2

put 'datalake:testdata',3,'cf1:name','Hello3'
put 'datalake:testdata',3,'cf1:number',3

scan 'datalake:testdata'

ROW                                       COLUMN+CELL                                                                                                             
 1                                        column=cf1:name, timestamp=1561061404909, value=Hello                                                                   
 1                                        column=cf1:number, timestamp=1561061417353, value=1                                                                     
 2                                        column=cf1:name, timestamp=1561061498304, value=Hello2                                                                  
 2                                        column=cf1:number, timestamp=1561061498347, value=2                                                                     
 3                                        column=cf1:name, timestamp=1561061498419, value=Hello3                                                                  
 3                                        column=cf1:number, timestamp=1561061500005, value=3                                                                     
3 row(s)
Took 0.8145 seconds               

As user3

hbase shell
scan 'datalake:testdata'

ROW                                       COLUMN+CELL                                                                                                             
 1                                        column=cf1:name, timestamp=1561061404909, value=Hello                                                                   
 1                                        column=cf1:number, timestamp=1561061417353, value=1                                                                     
 2                                        column=cf1:name, timestamp=1561061498304, value=Hello2                                                                  
 2                                        column=cf1:number, timestamp=1561061498347, value=2                                                                     
 3                                        column=cf1:name, timestamp=1561061498419, value=Hello3                                                                  
 3                                        column=cf1:number, timestamp=1561061500005, value=3                                                                     
3 row(s)
Took 0.8145 seconds               

Try to modify data

put 'datalake:testdata',3,'cf1:number',3

ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user ‘[email protected]',action: put, tableName:datalake:testdata, family:cf1, column: number

Try to disable table

disable 'datalake:testdata'

ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user '[email protected]' (action=create)

Try to create another table

create 'datalake:mytable','cf1'

ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user '[email protected]' (action=create)

As user1

hbase shell
scan 'datalake:testdata'

ROW                                       COLUMN+CELL                                                                                                             

ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user ‘[email protected]',action: scannerOpen, tableName:datalake:testdata, family:cf1.

HBase REST API

Security gap

There is a mistake in HDP 3.1. HBase REST API does not impersonate users, all activities are executed as hbase user. It means that any user having access to the HBase REST API server has full privileges in HBase regardless of any security settings. The fix is to replace hbase-rest jar delivered with the standard HDP payloads with the latest hbase-rest version. The test was conducted using rel/2.2.0 GitHub version.
The same problem persists in Cloudera, CDP 7.1.4. The workaround below was tested only for HDP.

git clone https://github.com/apache/hbase.git -b branch-2.0
cd hbase
mvn package -DskipTests

(as root user)

cd /usr/hdp/3.1.0.0-78/hbase/lib

(archive existing hbase-test jar file)

mkdir arch
mv mv hbase-rest-2.0.2.3.1.0.0-78.jar arch/
unlink hbase-rest.jar

(assuming /home/hbase/hbase as cloned Git repository)

ln -s /home/hbase/hbase/hbase-rest/target/hbase-rest-2.0.6-SNAPSHOT.jar hbase-rest.jar

Configure

Custom hbase-site.xml

Parameter Value
hbase.rest.support.proxyuser true
hbase.rest.authentication.type kerberos
hbase.rest.authentication.kerberos.keytab /etc/security/keytabs/spnego.service.keytab
hbase.rest.authentication.kerberos.principal <appropriate principal>
hbase.rest.keytab.file /etc/security/keytabs/hbase.service.keytab
hbase.rest.kerberos.principal <appropriate principal>

HDFS, custome core-site.xml

Parameter Value
hadoop.proxyuser.hbase.hosts *
hadoop.proxyuser.hbase.groups *

Restart all services affected.

Start and stop

  • HDP : HBase REST API is not enabled as a default. Should be manually activated when the cluster is started. As hbase user on the host where HBase Master is installed.

/usr/hdp/current/hbase-master/bin/hbase-daemon.sh start rest -p 9090

Wait several minutes until the service is ready.
HDP: Verify that the server is responding.

nc -vz localhost 9090

Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 127.0.0.1:9090.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

CDP: default port is 20550

nc -zv pimiento3 20550


curl -ik --negotiate -u : -H "Accept: text/xml" -X GET "http://http://hurds1.fyre.ibm.com:9090/version"

...............
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Version JVM="Oracle Corporation 1.8.0_201-25.201-b09" Jersey="" OS="Linux 3.10.0-957.10.1.el7.x86_64 amd64" REST="0.0.3" Server="jetty/9.3.25.v20180904"/>

List all tables.

curl -ik --negotiate -u : -H "Accept: text/xml" -X GET "http://http://hurds1.fyre.ibm.com:9090"


Stop or restart

/usr/hdp/current/hbase-master/bin/hbase-daemon.sh stop rest -p 9090
/usr/hdp/current/hbase-master/bin/hbase-daemon.sh restart rest -p 9090

Test

The code samples below assume that HBase REST API server node is hurds1.fyre.ibm.com and the server is listening on port 9090. Replace with the values corresponding to your environment.

Test as user1

User1 is a malicious user and should be denied any access to datalake tables.
Authenticate as user1

curl -ik --negotiate -u : -X GET -H "Accept: text/xml" "http://hurds1.fyre.ibm.com:9090/datalake:testdata/*"

............
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /datalake:testdata/*. Reason:
<pre>    Request failed.</pre></p>
<hr />
</body>

Test as user2

User2 is data administrator and should be able to read and modify the data.
Authenticate as user2.
Read datalate:testdata. The result data is Base64 encoded.

curl -ik --negotiate -u : -X GET -H "Accept: text/xml" "http://hurds1:9090/datalake:testdata/*"

............
HTTP/1.1 200 OK
........
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="MQ=="><Cell column="Y2YxOm5hbWU=" timestamp="1561061404909">SGVsbG8=</Cell><Cell column="Y2YxOm51bWJlcg==" timestamp="1561061417353">MQ==</Cell></Row><Row key="Mg=="><Cell column="Y2YxOm5hbWU=" timestamp="1561061498304">SGVsbG8y</Cell><Cell column="Y2YxOm51bWJlcg==" timestamp="1561061498347">Mg==</Cell></Row><Row key="Mw=="><Cell column="Y2YxOm5hbWU=" timestamp="1561061498419">SGVsbG8z</Cell><Cell column="Y2YxOm51bWJlcg==" timestamp="1561063342967">Mw==</Cell></Row><Row key="NA=="><Cell column="Y2YxOm5hbWU=" timestamp="1564786098120">SGVsbG80</Cell></Row><Row key="NAo="><Cell column="Y2YxOm5hbWUK" timestamp="1564785434372">SGVsbG80Cg==</Cell></Row></CellSet>
.........

Modify the data. The curl command below is the equivalence of hbase shell command: put 'datalake:testdata',4,'cf1:name','Hello4'

curl -ik --negotiate -u : -X PUT -H "Accept: text/xml" -H "Content-Type: text/xml" -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="NA=="><Cell column="Y2YxOm5hbWU=">SGVsbG80</Cell></Row></CellSet>' "http://hurds1:9090/datalake:testdata/1"

........
HTTP/1.1 200 OK
WWW-Authenticate: Negotiate 
........

Using hbase shell verify that the expected data is added or modified.

hbase shell

scan 'datalake:testdata'

ROW                                  COLUMN+CELL                                                                                              
 1                                   column=cf1:name, timestamp=1561061404909, value=Hello                                                    
 1                                   column=cf1:number, timestamp=1561061417353, value=1                                                      
 2                                   column=cf1:name, timestamp=1561061498304, value=Hello2                                                   
 2                                   column=cf1:number, timestamp=1561061498347, value=2                                                      
 3                                   column=cf1:name, timestamp=1561061498419, value=Hello3                                                   
 3                                   column=cf1:number, timestamp=1561063342967, value=3                                                      
 4                                   column=cf1:name, timestamp=1564828239454, value=Hello4       

Test as user3

User3 is data scientist and should be allowed to access the datalake but is unable to modify the data.
Authenticate as user3.

curl -ik --negotiate -u : -X GET -H "Accept: text/xml" "http://hurds1:9090/datalake:testdata/*"

......
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="MQ=="><Cell column="Y2YxOm5hbWU=" timestamp="1561061404909">SGVsbG8=</Cell><Cell column="Y2YxOm51bWJlcg==" timestamp="1561061417353">MQ==</Cell></Row><Row key="Mg=="><Cell column="Y2YxOm5hbWU=" timestamp="1561061498304">SGVsbG8y</Cell><Cell column="Y2YxOm51bWJlcg==" timestamp="1561061498347">Mg==</Cell></Row><Row key="Mw=="><Cell column="Y2YxOm5hbWU=" timestamp="1561061498419">SGVsbG8z</Cell><Cell column="Y2YxOm51bWJlcg==" timestamp="1561063342967">Mw==</Cell></Row><Row key="NA=="><Cell column="Y2YxOm5hbWU=" timestamp="1564828239454">SGVsbG80</Cell></Row><Row key="NAo="><Cell column="Y2YxOm5hbWUK" timestamp="1564785434372">SGVsbG80Cg==</Cell></Row></CellSet>[
.....

Try to modify the data.

curl -ik --negotiate -u : -X PUT -H "Accept: text/xml" -H "Content-Type: text/xml" -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="NA=="><Cell column="Y2YxOm5hbWU=">SGVsbG80</Cell></Row></CellSet>' "http://hurds1:9090/datalake:testdata/1"

........
Forbidden
org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user ‘user3',action: put, tableName:datalake:testdata, family:cf1, column: name
	at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.requirePermission(RangerAuthorizationCoprocessor.java:584
...

WebHBase and Knox

Configuration

Knox Gateway is a recommended mean to interact with HDP/Hadoop UIs and Web API REST services. The Knox Gateway is a proxy between the service and the user or developer without exposing the HBase service and the node directly.
To configure Knox HBase. Ambari->Knox->Configs->Advanced Topology. The default template is incorrect, the port number should point to HBase Rest API server, not HBbase Master port number.

            <service>
                <role>WEBHBASE</role>
                <url>http://{{hbase_master_host}}:{{hbase_master_port}}</url>
            </service>

Replace with valid host name and port number.

     <service>
                <role>WEBHBASE</role>
                <url>http://hurds1.fyre.ibm.com:9090</url>
            </service>

Restart the Knox service. Verify that Knox HBase is active and responding.

curl -ik --negotiate -u : -H "Accept: text/xml" -X GET "https://a1.fyre.ibm.com:8443/gateway/default/hbase/version"

Server: Jetty(9.4.12.v20180830)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Version JVM="Oracle Corporation 1.8.0_212-25.212-b04" Jersey="" OS="Linux 3.10.0-957.21.3.el7.x86_64 amd64" REST="0.0.3" Server="jetty/9.3.25.v20180904"/>

Test Knox HBase

The tests are exactly the same as for HBase REST API directly. The only difference is that instead of the URL of HBase REST API, the Knox URL should be used.
The tests assume that Knox hostname is a1.fyre.ibm.com and the Knox port is 8443.

Cloudera: In Cloudera, topology name is cdp-proxy-api, example curl call is:

curl -ik --negotiate -u : -H "Accept: text/xml" -X GET "https://pimiento1.fyre.ibm.com:8443/gateway/cdp-proxy-api/hbase/version"

Cloudera: during my test, I was unable to authorize using Kerberos ticket, *--negotiate -u : *. Only providing user credentials gave access to the service.

curl -ik --negotiate -u user1:password -H "Accept: text/xml" -X GET "https://pimiento1.fyre.ibm.com:8443/gateway/cdp-proxy-api/hbase/version"

Test as user1

The access should be denied.

curl -ik --negotiate -u : -H "Accept: text/xml" -X GET "https://a1.fyre.ibm.com:8443/gateway/default/hbase/datalake:testdata/*"

Test as user2

Both tests, modify data and read data, should pass.

curl -ik --negotiate -u : -X PUT -H "Accept: text/xml" -H "Content-Type: text/xml" -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="NA=="><Cell column="Y2YxOm5hbWU=">SGVsbG80</Cell></Row></CellSet>' "https://a1.fyre.ibm.com:8443/gateway/default/hbase/datalake:testdata/1"

curl -ik --negotiate -u : -H "Accept: text/xml" -X GET "https://a1.fyre.ibm.com:8443/gateway/default/hbase/datalake:testdata/*"

Test as user3

Request to modify data should fail.

curl -ik --negotiate -u : -X PUT -H "Accept: text/xml" -H "Content-Type: text/xml" -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="NA=="><Cell column="Y2YxOm5hbWU=">SGVsbG80</Cell></Row></CellSet>' "https://a1.fyre.ibm.com:8443/gateway/default/hbase/datalake:testdata/1"

Reading the data should be successful

curl -ik --negotiate -u : -H "Accept: text/xml" -X GET "https://a1.fyre.ibm.com:8443/gateway/default/hbase/datalake:testdata/*"

⚠️ **GitHub.com Fallback** ⚠️