HBase - kaushikdas/TechnicalWritings GitHub Wiki

At the core of HBase, it is split into Region Server
s that take care of something like horizontal sharding or range partitioning and automatically adapts to increase in data by repartitioning the data. To achieve this auto-sharding HBase follows a complex mechanism that involves write-ahead commit logs, merging things together over time asynchronously. These Region Servers sit on top of HDFS.
- ssh to VM
ssh [email protected] -p 2222
$ ssh [email protected] -p 2222
[email protected]'s password:
Last login: Fri Mar 26 07:09:40 2021 from 10.0.2.2
[maria_dev@sandbox ~]$
- Switch to
root
user and then switch to userhbase
[maria_dev@sandbox ~]$ su
Password:
[root@sandbox maria_dev]# su hbase
[hbase@sandbox maria_dev]$
- Run
hbase shell
command to launch the shell
[hbase@sandbox maria_dev]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.5.0.0-1245, r53538b8ab6749cbb6fdc0fe448b89aa82495fb3f, Fri Aug 26 01:32:27 UTC 2016
hbase(main):001:0> list
- Now fire
list
command that will list the existing HBase tables:
hbase(main):003:0> list
TABLE
ATLAS_ENTITY_AUDIT_EVENTS
atlas_titan
iemployee
3 row(s) in 0.4590 seconds
=> ["ATLAS_ENTITY_AUDIT_EVENTS", "atlas_titan", "iemployee"]
hbase(main):004:0>
Note You may get below error if the HBase service is not running:
hbase(main):001:0> list TABLE ERROR: Can't get master address from ZooKeeper; znode data == null Here is some help for this command: List all tables in hbase. Optional regular expression parameter could be used to filter the output. Examples: hbase> list hbase> list 'abc.*' hbase> list 'ns:abc.*' hbase> list 'ns:.*' hbase(main):002:0>
To fix this run the HBase service:
- Login to Ambari as admin
- Start HBase service
- Right panel
HBase
->Service Actions
drop down ->Start
-
Creating a namespace:
create_namespace '<namespace_name>'
hbase(main):005:0* create_namespace 'kaushik' 0 row(s) in 0.2850 seconds
- Use
list_namespace
to check (all) available namespaces:
hbase(main):006:0> list_namespace NAMESPACE default hbase kaushik 3 row(s) in 0.0950 seconds
default
andhbase
are two default namespaces- Use
describe_namespace '<namespace_name>'
to describe a given namespace
hbase(main):001:0> describe_namespace 'kaushik' DESCRIPTION {NAME => 'kaushik'} 1 row(s) in 0.6230 seconds
A
namespace
is equivalent to whatdatabase
is in RDBMS terms - Use
-
Create a table under a namespace
create '[<namespace_name>:]<table_name>, '<column_family>' [, '<column_family>', ...]
# cf_1 is the (only) column family, # ...no need to mention columns inside it hbase(main):002:0> create 'kaushik:test', 'cf_1' 0 row(s) in 2.7700 seconds => Hbase::Table - kaushik:test
If namespace is not specified, the table will be created under
default
namespace-
list
will list all available tables
hbase(main):001:0> list TABLE ATLAS_ENTITY_AUDIT_EVENTS atlas_titan iemployee kaushik:test 4 row(s) in 0.2570 seconds => ["ATLAS_ENTITY_AUDIT_EVENTS", "atlas_titan", "iemployee", "kaushik:test"]
- Use
describe
to describe any table
hbase(main):003:0> describe 'kaushik:test' Table kaushik:test is ENABLED kaushik:test COLUMN FAMILIES DESCRIPTION {NAME => 'cf_1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER' , COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.1810 seconds
-
drop
command is used to delete a table — before deleting a table needs to be disabled usingdisable
command
hbase(main):004:0> disable 'kaushik:test' 0 row(s) in 3.2250 seconds hbase(main):005:0> drop 'kaushik:test`
-
-
alter_namespace
command is used to alter a created namespacehbase(main):005:0> alter_namespace 'kaushik', {METHOD => 'set', \ hbase(main):006:1* 'VERSION' => 'DRAFT' } 0 row(s) in 0.0570 seconds
After the above command a new property
VERSION
will be added to the namespacekaushik
hbase(main):007:0> describe_namespace 'kaushik' DESCRIPTION {NAME => 'kaushik', VERSION => 'DRAFT'} 1 row(s) in 0.0130 seconds
-
A namespace can be dropped using
drop_namespace
hbase(main):008:0> drop_namespace 'kaushik' 0 row(s) in 0.0680 seconds hbase(main):009:0> list_namespace NAMESPACE default hbase 2 row(s) in 0.6190 seconds hbase(main):010:0>
-
DDL commands — CRUD operations
-
Create a table
customer
under namespacesales
with versioning enabled:# enable versioning (4) for first column family ctInfo hbase(main):001:0> create 'sales:customer', \ hbase(main):002:0* {NAME => 'ctInfo', VERSIONS => 4}, \ hbase(main):003:0* 'demo' 0 row(s) in 2.7460 seconds => Hbase::Table - sales:customer
-
Versioning needs to enabled for each column family seperately — if not specified it will be set to default of 1
- Here we have enabled four version 4 for column family
ctInfo
. Therefoe, for each column in this column family maximum 4 timestamped versions will be maintained for each value. We can verify this by describing the created'sales:customer'
table. We notice that for column familyctInfo
VERSIONS => '4'
where as for column familydemo
VERSIONS => '1'
hbase(main):004:0> describe 'sales:customer' Table sales:customer is ENABLED sales:customer COLUMN FAMILIES DESCRIPTION {NAME => 'ctInfo', BLOOMFILTER => 'ROW', VERSIONS => '4', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVE R', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'demo', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER' , COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.3020 seconds
- Here we have enabled four version 4 for column family
-
-
scan
— equivalent ofselect *
hbase(main):005:0> scan 'sales:customer' ROW COLUMN+CELL 0 row(s) in 0.0720 seconds
- Created table has no rows because we have not added any data
-
put
— adding data to table- Add data for ONLY 1 column at a time
hbase(main):006:0> put 'sales:customer', \ # table hbase(main):007:0* 'C00001', \ # row key hbase(main):008:0* 'ctInfo:name', 'Sudhir Mishra' # col name, value 0 row(s) in 0.0890 seconds hbase(main):009:0> put 'sales:customer', \ hbase(main):010:0* 'C00001', \ hbase(main):011:0* 'ctInfo:mobile', '9988771122' 0 row(s) in 0.0110 seconds hbase(main):012:0> put 'sales:customer', \ hbase(main):013:0* 'C00001', \ hbase(main):014:0* 'ctInfo:email', '[email protected]' 0 row(s) in 0.0130 seconds hbase(main):015:0> put 'sales:customer', \ hbase(main):016:0* 'C00001', \ hbase(main):017:0* 'demo:age', '34' 0 row(s) in 0.0240 seconds hbase(main):018:0> put 'sales:customer', \ hbase(main):019:0* 'C00001', \ hbase(main):020:0* 'demo:occupation', 'advocate' 0 row(s) in 0.0140 seconds hbase(main):021:0> put 'sales:customer', \ hbase(main):022:0* 'C00002', \ hbase(main):023:0* 'ctInfo:name', 'Kunal K Bajaj' 0 row(s) in 0.0140 seconds hbase(main):024:0> put 'sales:customer', \ hbase(main):025:0* 'C00002', \ hbase(main):026:0* 'ctInfo:tel', '02211223344' 0 row(s) in 0.0130 seconds hbase(main):027:0> put 'sales:customer', \ hbase(main):028:0* 'C00002', \ hbase(main):029:0* 'demo:age', '42' 0 row(s) in 0.0110 seconds hbase(main):030:0> put 'sales:customer', \ hbase(main):031:0* 'C00002', \ hbase(main):032:0* 'demo:education', 'graduate' 0 row(s) in 0.0110 seconds
-
See the added data using
scan
:
hbase(main):033:0> scan 'sales:customer'
ROW COLUMN+CELL
C00001 column=ctInfo:email, timestamp=1616771164898, [email protected]
C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122
C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra
C00001 column=demo:age, timestamp=1616771187409, value=34
C00001 column=demo:occupation, timestamp=1616771214484, value=advocate
C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj
C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344
C00002 column=demo:age, timestamp=1616771379529, value=42
C00002 column=demo:education, timestamp=1616771405214, value=graduate
2 row(s) in 0.0530 seconds
- Logical view of the table will be:
{
"C00001" : {
"ctInfo" : {
1616771022652 : { "name" : "Sudhir Mishra" },
1616771062503 : { "mobile" : "9988771122" },
1616771164898 : { "email" : "[email protected]" }
},
"demo" : {
1616771187409 : { "age" : "34" },
1616771214484 : { "occupation" : "advocate" }
}
},
"C00002" : {
"ctInfo" : {
1616771311930 : { "name" : "Kunal K Bajaj" },
1616771354236 : { "tel" : "02211223344" }
},
"demo" : {
1616771379529 : { "age" : "42" },
1616771405214 : { "education" : "graduate" }
}
}
}
This is a good example of sparse table:
Row Key | ctInfo:name | ctInfo:mobile | ctInfo:email | ctInfo:tel | demo:age | demo:occupation | demo:education |
---|---|---|---|---|---|---|---|
C00001 | 1616771022652: Sudhir Mishra | 1616771062503: 9988771122 | 1616771164898 : [email protected] | 1616771187409: 34 | 1616771214484 : advocate | ||
C00002 | 1616771311930: Kunal K Bajaj | 1616771354236: 02211223344 | 1616771379529: 42 | 1616771405214 : graduate |
-
Now let us add one for email id for
"C00001"
and do a scan — we see that we get the latest emailhbase(main):034:0> put 'sales:customer', \ hbase(main):035:0* 'C00001', \ hbase(main):036:0* 'ctInfo:email', '[email protected]' 0 row(s) in 0.3130 seconds hbase(main):037:0> scan 'sales:customer' ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344 C00002 column=demo:age, timestamp=1616771379529, value=42 C00002 column=demo:education, timestamp=1616771405214, value=graduate 2 row(s) in 0.0550 seconds
-
To get all the versions of the email column values we need to specify version for scan:
hbase(main):039:0> scan 'sales:customer', {VERSIONS => 4} ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:email, timestamp=1616771164898, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344 C00002 column=demo:age, timestamp=1616771379529, value=42 C00002 column=demo:education, timestamp=1616771405214, value=graduate 2 row(s) in 0.0570 seconds
- Now we see both email ids
-
get
— fetch a specific row with keyhbase(main):041:0> get 'sales:customer', 'C00001' COLUMN CELL ctInfo:email timestamp=1616771484147, [email protected] ctInfo:mobile timestamp=1616771062503, value=9988771122 ctInfo:name timestamp=1616771022652, value=Sudhir Mishra demo:age timestamp=1616771187409, value=34 demo:occupation timestamp=1616771214484, value=advocate 5 row(s) in 0.0220 seconds
- Again to get all versions we need to specify version
hbase(main):043:0> get 'sales:customer', 'C00001', {COLUMN => 'ctInfo', VERSIONS => 2} COLUMN CELL ctInfo:email timestamp=1616771484147, [email protected] ctInfo:email timestamp=1616771164898, [email protected] ctInfo:mobile timestamp=1616771062503, value=9988771122 ctInfo:name timestamp=1616771022652, value=Sudhir Mishra 4 row(s) in 0.0410 seconds
- Now we see both email ids
-
delete
anddeleteall
— delete a specific column value or a full row- To delete a spefic version of column value we need to give version:
hbase(main):045:0> delete 'sales:customer', 'C00001', 'ctInfo:email', 1616771164898 0 row(s) in 0.0250 seconds hbase(main):046:0> get 'sales:customer', 'C00001', {COLUMN => 'ctInfo', VERSIONS => 2} COLUMN CELL ctInfo:email timestamp=1616771484147, [email protected] ctInfo:mobile timestamp=1616771062503, value=9988771122 ctInfo:name timestamp=1616771022652, value=Sudhir Mishra 3 row(s) in 0.0280 seconds hbase(main):047:0> scan 'sales:customer' ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344 C00002 column=demo:age, timestamp=1616771379529, value=42 C00002 column=demo:education, timestamp=1616771405214, value=graduate 2 row(s) in 0.0530 seconds hbase(main):048:0> scan 'sales:customer', {VERSIONS => 4} ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344 C00002 column=demo:age, timestamp=1616771379529, value=42 C00002 column=demo:education, timestamp=1616771405214, value=graduate 2 row(s) in 0.1720 seconds
- Delete a full row with
deleteall
hbase(main):050:0> deleteall 'sales:customer', 'C00002' 0 row(s) in 0.0060 seconds hbase(main):051:0> scan 'sales:customer', {VERSIONS => 4} ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate 1 row(s) in 0.0300 seconds
-
We will create a
sales
table and populate that with sale information- From Ambari start HBase service
[maria_dev@sandbox ~]$ su
Password:
[root@sandbox maria_dev]# su hbase
[hbase@sandbox maria_dev]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.5.0.0-1245, r53538b8ab6749cbb6fdc0fe448b89aa82495fb3f, Fri Aug 26 01:32:27 UTC 2016
hbase(main):001:0> create table "kaushik:sales", "cf_order"
NoMethodError: undefined method `table' for #<Object:0x3d3c886f>
hbase(main):007:0> create "kaushik:sales", "cf_order"
0 row(s) in 3.3440 seconds
=> Hbase::Table - kaushik:sales
hbase(main):008:0> describe "kaushik:sales"
Table kaushik:sales is ENABLED
kaushik:sales
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf_order', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE
VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.3510 seconds
hbase(main):009:0>
- We will use
salesOrder.csv
from the Internet$ curl -o ./sales.csv https://raw.githubusercontent.com/bsullins/data/master/salesOrders.csv % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 453k 100 453k 0 0 226k 0 0:00:02 0:00:02 --:--:-- 200k $ ls sales.csv sales.csv $ head sales.csv RowKey,Order ID,Order Date,Ship Date,Ship Mode,Profit,Quantity,Sales 1,CA-2011-100006,2013-09-07 00:00:00,2013-09-13 00:00:00,Standard Class,109.61,3.00,377.97 2,CA-2011-100090,2013-07-08 00:00:00,2013-07-12 00:00:00,Standard Class,-19.09,9.00,699.19 3,CA-2011-100293,2013-03-14 00:00:00,2013-03-18 00:00:00,Standard Class,31.87,6.00,91.06 4,CA-2011-100328,2013-01-28 00:00:00,2013-02-03 00:00:00,Standard Class,1.33,1.00,3.93 5,CA-2011-100363,2013-04-08 00:00:00,2013-04-15 00:00:00,Standard Class,7.72,5.00,21.38 6,CA-2011-100391,2013-05-25 00:00:00,2013-05-29 00:00:00,Standard Class,6.73,2.00,14.62 7,CA-2011-100678,2013-04-18 00:00:00,2013-04-22 00:00:00,Standard Class,61.79,11.00,697.07 8,CA-2011-100706,2013-12-16 00:00:00,2013-12-18 00:00:00,Second Class,17.72,8.00,129.44 9,CA-2011-100762,2013-11-24 00:00:00,2013-11-29 00:00:00,Standard Class,219.08,11.00,508.62
- We will remove the 1st row:
$ sed -i '1d' sales.csv $ head sales.csv 1,CA-2011-100006,2013-09-07 00:00:00,2013-09-13 00:00:00,Standard Class,109.61,3.00,377.97 2,CA-2011-100090,2013-07-08 00:00:00,2013-07-12 00:00:00,Standard Class,-19.09,9.00,699.19 ...
- We will transfer the downloaded file to VM
$ scp -P 2222 sales.csv maria_dev@localhost:/home/maria_dev/.
- Now we will copy the file
sales.csv
to HDFS
$ ssh -p 2222 maria_dev@localhost
maria_dev@localhost's password:
Last login: Fri Mar 26 10:22:44 2021 from 10.0.2.2
[maria_dev@sandbox ~]$ head sales.csv
1,CA-2011-100006,2013-09-07 00:00:00,2013-09-13 00:00:00,Standard Class,109.61,3.00,377.97
2,CA-2011-100090,2013-07-08 00:00:00,2013-07-12 00:00:00,Standard Class,-19.09,9.00,699.19
...
[maria_dev@sandbox ~]$ hadoop fs -copyFromLocal sales.csv /tmp
[maria_dev@sandbox ~]$ hadoop fs -ls /tmp
Found 7 items
drwxrwxrwx - maria_dev hdfs 0 2020-03-05 17:09 /tmp/.pigjobs
drwxrwxrwx - maria_dev hdfs 0 2020-03-05 17:09 /tmp/.pigscripts
drwxrwxrwx - maria_dev hdfs 0 2020-03-05 17:09 /tmp/.pigstore
drwxr-xr-x - hdfs hdfs 0 2016-10-25 07:48 /tmp/entity-file-history
drwx-wx-wx - ambari-qa hdfs 0 2016-10-25 07:51 /tmp/hive
drwx------ - maria_dev hdfs 0 2020-03-05 17:14 /tmp/maria_dev
-rw-r--r-- 1 maria_dev hdfs 459327 2021-03-31 16:02 /tmp/sales.csv
[maria_dev@sandbox ~]$
- We will now load the
kaushik:sales
HBase table with data fromsales.csv
:
[maria_dev@sandbox ~]$ su
[root@sandbox maria_dev]# hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.separator=, # field seperator in the input file
-Dimporttsv.columns="\ # start specifiying columns
HBASE_ROW_KEY, \ # ... 1st item is row key
cf_order:orderId, \ # ... followed by values for
cf_order:orderDate, \ # ... other columns
cf_order:shipDate, \
cf_order:shipMode, \
cf_order:profit, \
cf_order:quantity, \
cf_order:sales" \
kaushik:sales \ # target table name
hdfs://sandbox.hortonworks.com:/tmp/sales.csv # hdfs path for input csv file
- This will fire a map-reduce job:
2021-03-31 16:39:02,868 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x62e7f11d connecting to ZooKeeper ensemble=sandbox.hortonworks.com:2181
2021-03-31 16:39:02,879 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1245--1, built on 08/26/2016 00:47 GMT
2021-03-31 16:39:02,879 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=sandbox.hortonworks.com
2021-03-31 16:39:02,879 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_111
...
- and the table will be filles with rows:
hbase(main):002:0> scan "kaushik:sales"
...
999 column=cf_order:shipDate, timestamp=1617208741407, value=2014-08-23 00:00:00
999 column=cf_order:shipMode, timestamp=1617208741407, value=Same Day
5009 row(s) in 11.8840 seconds
-
HBase exposes REST API that can be directly accessed from the cleint system
- We will use a Python client to create a sparse table using the movie rating data:
$ head u.data
0 50 5 881250949
0 172 5 881250949
0 133 1 881250949
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
- First column is user id, second column is movie id, 3rd column is rating of the movie by the user and last column is the rating time stamp
- In our sparse table we will keep only the movie ratings by each user — so logically the table will be something as below:
+---------+ +--------------+ +---------------+ +---------------+
| user_id | ----> | movie_id: 50 | --> | movie_id: 172 | --> | movie_id: 133 |
+---------+ | rating: 5 | | rating: 5 | | rating: 5 |
+--------------+ +---------------+ +---------------+
-
We will use
starbase
Python package that provide nice programming interface over HBase REST APIs. We can use this usingpip install starbase
-
To use the HBase REST interface we need to first start the HBase REST server
[root@sandbox maria_dev]# /usr/hdp/current/hbase-master/bin/hbase-daemon.sh \
> start rest \ # start HBase REST service
> -p 8000 \ # ... on port 8000
> --infoport 8001 # ... and debugging information streaming at port 8001
starting rest, logging to /var/log/hbase/hbase-maria_dev-rest-sandbox.hortonworks.com.out
[root@sandbox maria_dev]#
-
Also we need to enable the corresponding port forwarding for the VM"
- Run VM
- Running VM -> Right Click -> Settings -> Network -> Advanced -> Port Forwarding -> add 2 New port forwarding rule by clicking + icon:
Name | Protocol | Host IP | Host Port | Guest IP | Guest Port |
---|---|---|---|---|---|
HBase REST | TCP | 127.0.0.1 | 8000 | 8000 | |
HBase REST Info | TCP | 127.0.0.1 | 8000 | 8000 |