HBase - kaushikdas/TechnicalWritings GitHub Wiki

At the core of HBase, it is split into Region Servers that take care of something like horizontal sharding or range partitioning and automatically adapts to increase in data by repartitioning the data. To achieve this auto-sharding HBase follows a complex mechanism that involves write-ahead commit logs, merging things together over time asynchronously. These Region Servers sit on top of HDFS.
- ssh to VM
ssh [email protected] -p 2222
$ ssh [email protected] -p 2222
[email protected]'s password:
Last login: Fri Mar 26 07:09:40 2021 from 10.0.2.2
[maria_dev@sandbox ~]$
- Switch to
rootuser and then switch to userhbase
[maria_dev@sandbox ~]$ su
Password:
[root@sandbox maria_dev]# su hbase
[hbase@sandbox maria_dev]$
- Run
hbase shellcommand to launch the shell
[hbase@sandbox maria_dev]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.5.0.0-1245, r53538b8ab6749cbb6fdc0fe448b89aa82495fb3f, Fri Aug 26 01:32:27 UTC 2016
hbase(main):001:0> list
- Now fire
listcommand that will list the existing HBase tables:
hbase(main):003:0> list
TABLE
ATLAS_ENTITY_AUDIT_EVENTS
atlas_titan
iemployee
3 row(s) in 0.4590 seconds
=> ["ATLAS_ENTITY_AUDIT_EVENTS", "atlas_titan", "iemployee"]
hbase(main):004:0>
Note You may get below error if the HBase service is not running:
hbase(main):001:0> list TABLE ERROR: Can't get master address from ZooKeeper; znode data == null Here is some help for this command: List all tables in hbase. Optional regular expression parameter could be used to filter the output. Examples: hbase> list hbase> list 'abc.*' hbase> list 'ns:abc.*' hbase> list 'ns:.*' hbase(main):002:0>To fix this run the HBase service:
- Login to Ambari as admin
- Start HBase service
- Right panel
HBase->Service Actionsdrop down ->Start
-
Creating a namespace:
create_namespace '<namespace_name>'hbase(main):005:0* create_namespace 'kaushik' 0 row(s) in 0.2850 seconds
- Use
list_namespaceto check (all) available namespaces:
hbase(main):006:0> list_namespace NAMESPACE default hbase kaushik 3 row(s) in 0.0950 seconds
defaultandhbaseare two default namespaces- Use
describe_namespace '<namespace_name>'to describe a given namespace
hbase(main):001:0> describe_namespace 'kaushik' DESCRIPTION {NAME => 'kaushik'} 1 row(s) in 0.6230 seconds
A
namespaceis equivalent to whatdatabaseis in RDBMS terms - Use
-
Create a table under a namespace
create '[<namespace_name>:]<table_name>, '<column_family>' [, '<column_family>', ...]# cf_1 is the (only) column family, # ...no need to mention columns inside it hbase(main):002:0> create 'kaushik:test', 'cf_1' 0 row(s) in 2.7700 seconds => Hbase::Table - kaushik:test
If namespace is not specified, the table will be created under
defaultnamespace-
listwill list all available tables
hbase(main):001:0> list TABLE ATLAS_ENTITY_AUDIT_EVENTS atlas_titan iemployee kaushik:test 4 row(s) in 0.2570 seconds => ["ATLAS_ENTITY_AUDIT_EVENTS", "atlas_titan", "iemployee", "kaushik:test"]
- Use
describeto describe any table
hbase(main):003:0> describe 'kaushik:test' Table kaushik:test is ENABLED kaushik:test COLUMN FAMILIES DESCRIPTION {NAME => 'cf_1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER' , COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.1810 seconds
-
dropcommand is used to delete a table — before deleting a table needs to be disabled usingdisablecommand
hbase(main):004:0> disable 'kaushik:test' 0 row(s) in 3.2250 seconds hbase(main):005:0> drop 'kaushik:test`
-
-
alter_namespacecommand is used to alter a created namespacehbase(main):005:0> alter_namespace 'kaushik', {METHOD => 'set', \ hbase(main):006:1* 'VERSION' => 'DRAFT' } 0 row(s) in 0.0570 seconds
After the above command a new property
VERSIONwill be added to the namespacekaushikhbase(main):007:0> describe_namespace 'kaushik' DESCRIPTION {NAME => 'kaushik', VERSION => 'DRAFT'} 1 row(s) in 0.0130 seconds
-
A namespace can be dropped using
drop_namespacehbase(main):008:0> drop_namespace 'kaushik' 0 row(s) in 0.0680 seconds hbase(main):009:0> list_namespace NAMESPACE default hbase 2 row(s) in 0.6190 seconds hbase(main):010:0>
-
DDL commands — CRUD operations
-
Create a table
customerunder namespacesaleswith versioning enabled:# enable versioning (4) for first column family ctInfo hbase(main):001:0> create 'sales:customer', \ hbase(main):002:0* {NAME => 'ctInfo', VERSIONS => 4}, \ hbase(main):003:0* 'demo' 0 row(s) in 2.7460 seconds => Hbase::Table - sales:customer
-
Versioning needs to enabled for each column family seperately — if not specified it will be set to default of 1
- Here we have enabled four version 4 for column family
ctInfo. Therefoe, for each column in this column family maximum 4 timestamped versions will be maintained for each value. We can verify this by describing the created'sales:customer'table. We notice that for column familyctInfoVERSIONS => '4'where as for column familydemoVERSIONS => '1'
hbase(main):004:0> describe 'sales:customer' Table sales:customer is ENABLED sales:customer COLUMN FAMILIES DESCRIPTION {NAME => 'ctInfo', BLOOMFILTER => 'ROW', VERSIONS => '4', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVE R', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'demo', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER' , COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.3020 seconds
- Here we have enabled four version 4 for column family
-
-
scan— equivalent ofselect *hbase(main):005:0> scan 'sales:customer' ROW COLUMN+CELL 0 row(s) in 0.0720 seconds
- Created table has no rows because we have not added any data
-
put— adding data to table- Add data for ONLY 1 column at a time
hbase(main):006:0> put 'sales:customer', \ # table hbase(main):007:0* 'C00001', \ # row key hbase(main):008:0* 'ctInfo:name', 'Sudhir Mishra' # col name, value 0 row(s) in 0.0890 seconds hbase(main):009:0> put 'sales:customer', \ hbase(main):010:0* 'C00001', \ hbase(main):011:0* 'ctInfo:mobile', '9988771122' 0 row(s) in 0.0110 seconds hbase(main):012:0> put 'sales:customer', \ hbase(main):013:0* 'C00001', \ hbase(main):014:0* 'ctInfo:email', '[email protected]' 0 row(s) in 0.0130 seconds hbase(main):015:0> put 'sales:customer', \ hbase(main):016:0* 'C00001', \ hbase(main):017:0* 'demo:age', '34' 0 row(s) in 0.0240 seconds hbase(main):018:0> put 'sales:customer', \ hbase(main):019:0* 'C00001', \ hbase(main):020:0* 'demo:occupation', 'advocate' 0 row(s) in 0.0140 seconds hbase(main):021:0> put 'sales:customer', \ hbase(main):022:0* 'C00002', \ hbase(main):023:0* 'ctInfo:name', 'Kunal K Bajaj' 0 row(s) in 0.0140 seconds hbase(main):024:0> put 'sales:customer', \ hbase(main):025:0* 'C00002', \ hbase(main):026:0* 'ctInfo:tel', '02211223344' 0 row(s) in 0.0130 seconds hbase(main):027:0> put 'sales:customer', \ hbase(main):028:0* 'C00002', \ hbase(main):029:0* 'demo:age', '42' 0 row(s) in 0.0110 seconds hbase(main):030:0> put 'sales:customer', \ hbase(main):031:0* 'C00002', \ hbase(main):032:0* 'demo:education', 'graduate' 0 row(s) in 0.0110 seconds
-
See the added data using
scan:
hbase(main):033:0> scan 'sales:customer'
ROW COLUMN+CELL
C00001 column=ctInfo:email, timestamp=1616771164898, [email protected]
C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122
C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra
C00001 column=demo:age, timestamp=1616771187409, value=34
C00001 column=demo:occupation, timestamp=1616771214484, value=advocate
C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj
C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344
C00002 column=demo:age, timestamp=1616771379529, value=42
C00002 column=demo:education, timestamp=1616771405214, value=graduate
2 row(s) in 0.0530 seconds- Logical view of the table will be:
{
"C00001" : {
"ctInfo" : {
1616771022652 : { "name" : "Sudhir Mishra" },
1616771062503 : { "mobile" : "9988771122" },
1616771164898 : { "email" : "[email protected]" }
},
"demo" : {
1616771187409 : { "age" : "34" },
1616771214484 : { "occupation" : "advocate" }
}
},
"C00002" : {
"ctInfo" : {
1616771311930 : { "name" : "Kunal K Bajaj" },
1616771354236 : { "tel" : "02211223344" }
},
"demo" : {
1616771379529 : { "age" : "42" },
1616771405214 : { "education" : "graduate" }
}
}
}This is a good example of sparse table:
| Row Key | ctInfo:name | ctInfo:mobile | ctInfo:email | ctInfo:tel | demo:age | demo:occupation | demo:education |
|---|---|---|---|---|---|---|---|
| C00001 | 1616771022652: Sudhir Mishra | 1616771062503: 9988771122 | 1616771164898 : [email protected] | 1616771187409: 34 | 1616771214484 : advocate | ||
| C00002 | 1616771311930: Kunal K Bajaj | 1616771354236: 02211223344 | 1616771379529: 42 | 1616771405214 : graduate |
-
Now let us add one for email id for
"C00001"and do a scan — we see that we get the latest emailhbase(main):034:0> put 'sales:customer', \ hbase(main):035:0* 'C00001', \ hbase(main):036:0* 'ctInfo:email', '[email protected]' 0 row(s) in 0.3130 seconds hbase(main):037:0> scan 'sales:customer' ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344 C00002 column=demo:age, timestamp=1616771379529, value=42 C00002 column=demo:education, timestamp=1616771405214, value=graduate 2 row(s) in 0.0550 seconds
-
To get all the versions of the email column values we need to specify version for scan:
hbase(main):039:0> scan 'sales:customer', {VERSIONS => 4} ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:email, timestamp=1616771164898, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344 C00002 column=demo:age, timestamp=1616771379529, value=42 C00002 column=demo:education, timestamp=1616771405214, value=graduate 2 row(s) in 0.0570 seconds
- Now we see both email ids
-
get— fetch a specific row with keyhbase(main):041:0> get 'sales:customer', 'C00001' COLUMN CELL ctInfo:email timestamp=1616771484147, [email protected] ctInfo:mobile timestamp=1616771062503, value=9988771122 ctInfo:name timestamp=1616771022652, value=Sudhir Mishra demo:age timestamp=1616771187409, value=34 demo:occupation timestamp=1616771214484, value=advocate 5 row(s) in 0.0220 seconds
- Again to get all versions we need to specify version
hbase(main):043:0> get 'sales:customer', 'C00001', {COLUMN => 'ctInfo', VERSIONS => 2} COLUMN CELL ctInfo:email timestamp=1616771484147, [email protected] ctInfo:email timestamp=1616771164898, [email protected] ctInfo:mobile timestamp=1616771062503, value=9988771122 ctInfo:name timestamp=1616771022652, value=Sudhir Mishra 4 row(s) in 0.0410 seconds
- Now we see both email ids
-
deleteanddeleteall— delete a specific column value or a full row- To delete a spefic version of column value we need to give version:
hbase(main):045:0> delete 'sales:customer', 'C00001', 'ctInfo:email', 1616771164898 0 row(s) in 0.0250 seconds hbase(main):046:0> get 'sales:customer', 'C00001', {COLUMN => 'ctInfo', VERSIONS => 2} COLUMN CELL ctInfo:email timestamp=1616771484147, [email protected] ctInfo:mobile timestamp=1616771062503, value=9988771122 ctInfo:name timestamp=1616771022652, value=Sudhir Mishra 3 row(s) in 0.0280 seconds hbase(main):047:0> scan 'sales:customer' ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344 C00002 column=demo:age, timestamp=1616771379529, value=42 C00002 column=demo:education, timestamp=1616771405214, value=graduate 2 row(s) in 0.0530 seconds hbase(main):048:0> scan 'sales:customer', {VERSIONS => 4} ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate C00002 column=ctInfo:name, timestamp=1616771311930, value=Kunal K Bajaj C00002 column=ctInfo:tel, timestamp=1616771354236, value=02211223344 C00002 column=demo:age, timestamp=1616771379529, value=42 C00002 column=demo:education, timestamp=1616771405214, value=graduate 2 row(s) in 0.1720 seconds
- Delete a full row with
deleteall
hbase(main):050:0> deleteall 'sales:customer', 'C00002' 0 row(s) in 0.0060 seconds hbase(main):051:0> scan 'sales:customer', {VERSIONS => 4} ROW COLUMN+CELL C00001 column=ctInfo:email, timestamp=1616771484147, [email protected] C00001 column=ctInfo:mobile, timestamp=1616771062503, value=9988771122 C00001 column=ctInfo:name, timestamp=1616771022652, value=Sudhir Mishra C00001 column=demo:age, timestamp=1616771187409, value=34 C00001 column=demo:occupation, timestamp=1616771214484, value=advocate 1 row(s) in 0.0300 seconds
-
We will create a
salestable and populate that with sale information- From Ambari start HBase service
[maria_dev@sandbox ~]$ su
Password:
[root@sandbox maria_dev]# su hbase
[hbase@sandbox maria_dev]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.5.0.0-1245, r53538b8ab6749cbb6fdc0fe448b89aa82495fb3f, Fri Aug 26 01:32:27 UTC 2016
hbase(main):001:0> create table "kaushik:sales", "cf_order"
NoMethodError: undefined method `table' for #<Object:0x3d3c886f>
hbase(main):007:0> create "kaushik:sales", "cf_order"
0 row(s) in 3.3440 seconds
=> Hbase::Table - kaushik:sales
hbase(main):008:0> describe "kaushik:sales"
Table kaushik:sales is ENABLED
kaushik:sales
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf_order', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE
VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.3510 seconds
hbase(main):009:0>
- We will use
salesOrder.csvfrom the Internet$ curl -o ./sales.csv https://raw.githubusercontent.com/bsullins/data/master/salesOrders.csv % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 453k 100 453k 0 0 226k 0 0:00:02 0:00:02 --:--:-- 200k $ ls sales.csv sales.csv $ head sales.csv RowKey,Order ID,Order Date,Ship Date,Ship Mode,Profit,Quantity,Sales 1,CA-2011-100006,2013-09-07 00:00:00,2013-09-13 00:00:00,Standard Class,109.61,3.00,377.97 2,CA-2011-100090,2013-07-08 00:00:00,2013-07-12 00:00:00,Standard Class,-19.09,9.00,699.19 3,CA-2011-100293,2013-03-14 00:00:00,2013-03-18 00:00:00,Standard Class,31.87,6.00,91.06 4,CA-2011-100328,2013-01-28 00:00:00,2013-02-03 00:00:00,Standard Class,1.33,1.00,3.93 5,CA-2011-100363,2013-04-08 00:00:00,2013-04-15 00:00:00,Standard Class,7.72,5.00,21.38 6,CA-2011-100391,2013-05-25 00:00:00,2013-05-29 00:00:00,Standard Class,6.73,2.00,14.62 7,CA-2011-100678,2013-04-18 00:00:00,2013-04-22 00:00:00,Standard Class,61.79,11.00,697.07 8,CA-2011-100706,2013-12-16 00:00:00,2013-12-18 00:00:00,Second Class,17.72,8.00,129.44 9,CA-2011-100762,2013-11-24 00:00:00,2013-11-29 00:00:00,Standard Class,219.08,11.00,508.62
- We will remove the 1st row:
$ sed -i '1d' sales.csv $ head sales.csv 1,CA-2011-100006,2013-09-07 00:00:00,2013-09-13 00:00:00,Standard Class,109.61,3.00,377.97 2,CA-2011-100090,2013-07-08 00:00:00,2013-07-12 00:00:00,Standard Class,-19.09,9.00,699.19 ...
- We will transfer the downloaded file to VM
$ scp -P 2222 sales.csv maria_dev@localhost:/home/maria_dev/.
- Now we will copy the file
sales.csvto HDFS
$ ssh -p 2222 maria_dev@localhost
maria_dev@localhost's password:
Last login: Fri Mar 26 10:22:44 2021 from 10.0.2.2
[maria_dev@sandbox ~]$ head sales.csv
1,CA-2011-100006,2013-09-07 00:00:00,2013-09-13 00:00:00,Standard Class,109.61,3.00,377.97
2,CA-2011-100090,2013-07-08 00:00:00,2013-07-12 00:00:00,Standard Class,-19.09,9.00,699.19
...
[maria_dev@sandbox ~]$ hadoop fs -copyFromLocal sales.csv /tmp
[maria_dev@sandbox ~]$ hadoop fs -ls /tmp
Found 7 items
drwxrwxrwx - maria_dev hdfs 0 2020-03-05 17:09 /tmp/.pigjobs
drwxrwxrwx - maria_dev hdfs 0 2020-03-05 17:09 /tmp/.pigscripts
drwxrwxrwx - maria_dev hdfs 0 2020-03-05 17:09 /tmp/.pigstore
drwxr-xr-x - hdfs hdfs 0 2016-10-25 07:48 /tmp/entity-file-history
drwx-wx-wx - ambari-qa hdfs 0 2016-10-25 07:51 /tmp/hive
drwx------ - maria_dev hdfs 0 2020-03-05 17:14 /tmp/maria_dev
-rw-r--r-- 1 maria_dev hdfs 459327 2021-03-31 16:02 /tmp/sales.csv
[maria_dev@sandbox ~]$
- We will now load the
kaushik:salesHBase table with data fromsales.csv:
[maria_dev@sandbox ~]$ su
[root@sandbox maria_dev]# hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.separator=, # field seperator in the input file
-Dimporttsv.columns="\ # start specifiying columns
HBASE_ROW_KEY, \ # ... 1st item is row key
cf_order:orderId, \ # ... followed by values for
cf_order:orderDate, \ # ... other columns
cf_order:shipDate, \
cf_order:shipMode, \
cf_order:profit, \
cf_order:quantity, \
cf_order:sales" \
kaushik:sales \ # target table name
hdfs://sandbox.hortonworks.com:/tmp/sales.csv # hdfs path for input csv file
- This will fire a map-reduce job:
2021-03-31 16:39:02,868 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x62e7f11d connecting to ZooKeeper ensemble=sandbox.hortonworks.com:2181
2021-03-31 16:39:02,879 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1245--1, built on 08/26/2016 00:47 GMT
2021-03-31 16:39:02,879 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=sandbox.hortonworks.com
2021-03-31 16:39:02,879 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_111
...
- and the table will be filles with rows:
hbase(main):002:0> scan "kaushik:sales"
...
999 column=cf_order:shipDate, timestamp=1617208741407, value=2014-08-23 00:00:00
999 column=cf_order:shipMode, timestamp=1617208741407, value=Same Day
5009 row(s) in 11.8840 seconds
-
HBase exposes REST API that can be directly accessed from the cleint system
- We will use a Python client to create a sparse table using the movie rating data:
$ head u.data
0 50 5 881250949
0 172 5 881250949
0 133 1 881250949
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
- First column is user id, second column is movie id, 3rd column is rating of the movie by the user and last column is the rating time stamp
- In our sparse table we will keep only the movie ratings by each user — so logically the table will be something as below:
+---------+ +--------------+ +---------------+ +---------------+
| user_id | ----> | movie_id: 50 | --> | movie_id: 172 | --> | movie_id: 133 |
+---------+ | rating: 5 | | rating: 5 | | rating: 5 |
+--------------+ +---------------+ +---------------+
-
We will use
starbasePython package that provide nice programming interface over HBase REST APIs. We can use this usingpip install starbase -
To use the HBase REST interface we need to first start the HBase REST server
[root@sandbox maria_dev]# /usr/hdp/current/hbase-master/bin/hbase-daemon.sh \
> start rest \ # start HBase REST service
> -p 8000 \ # ... on port 8000
> --infoport 8001 # ... and debugging information streaming at port 8001
starting rest, logging to /var/log/hbase/hbase-maria_dev-rest-sandbox.hortonworks.com.out
[root@sandbox maria_dev]#
-
Also we need to enable the corresponding port forwarding for the VM"
- Run VM
- Running VM -> Right Click -> Settings -> Network -> Advanced -> Port Forwarding -> add 2 New port forwarding rule by clicking + icon:
| Name | Protocol | Host IP | Host Port | Guest IP | Guest Port |
|---|---|---|---|---|---|
| HBase REST | TCP | 127.0.0.1 | 8000 | 8000 | |
| HBase REST Info | TCP | 127.0.0.1 | 8000 | 8000 |