HBase - cchantra/bigdata.github.io GitHub Wiki

Assume you have setup Java and Hadoop already.

Installing HBase

We can install HBase in any of the three modes: Standalone mode, Pseudo Distributed mode, and Fully Distributed mode.

Installing HBase in Standalone Mode

wget  http://archive.apache.org/dist/hbase/1.4.13/hbase-1.4.13-bin.tar.gz
tar xvf hbase-1.4.13-bin.tar.gz
mv hbase-1.4.13-bin hbase

Shift to super user mode and move the HBase folder to /usr/local as shown below.

  • Configuring HBase in Standalone Mode

Consider hbase-env.sh

Before proceeding with HBase, you have to edit the following files and configure HBase.

  • Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME environment variable and change the existing path to your current JAVA_HOME variable as shown below.
cd /home/hadoop/hbase/conf 
vi hbase-env.sh

Then add

export JAVA_HOME="$(jrunscript -e 'java.lang.System.out.println(java.lang.System.getProperty("java.home"));')"
export HBASE_HOME=/home/hadoop/hbase 
export PATH=$PATH:/home/hadoop/hbase/bin 
export CLASSPATH=$CLASSPATH:/home/hadoop/hbase/lib 
export HBASE_MANAGES_ZK=true

Or you can add these to .bashrc

This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with your current value as shown below.

Consider hbase-site.xml

This is the main configuration file of HBase. Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files, open the hbase-site.xml file as shown below.

cd /home/hadoop/hbase/conf    
vi hbase-site.xml

Inside the hbase-site.xml file, you will find the and tags. Within them, set the HBase directory under the property key with the name "hbase.rootdir" as shown below.

<configuration>   
 //Here you have to set the path where you want HBase to store its files.   
<property>       
<name>hbase.rootdir</name>       
<value>file:/home/hadoop/hbase/HFiles</value>    
</property> 	   
 //Here you have to set the path where you want HBase to store its built in zookeeper  files.    
<property>      
<name>hbase.zookeeper.property.dataDir</name>       
<value>/home/hadoop/zookeeper</value>    
</property>
</configuration>

With this, the HBase installation and configuration part is successfully complete. We can start HBase by using start-hbase.sh script provided in the bin folder of HBase. For that, open HBase Home Folder and run HBase start script as shown below.

cd /home/hadoop/hbase/bin 
./start-hbase.sh

You 'll see,

starting master, logging to /usr/local/HBase/bin/../logs/hbase-tpmaster-localhost.localdomain.out

If everything goes well, when you try to run HBase start script, it will prompt you a message saying that HBase has started.

Installing HBase in Pseudo-Distributed Mode

CONFIGURING HBASE

Let us now check how HBase is installed in pseudo-distributed mode.

Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote system and make sure they are running. Stop HBase if it is running.

Consdier hbase-site.xml

Edit hbase-site.xml file to add the following properties.

<property>    
<name>hbase.cluster.distributed</name>    
<value>true</value>
</property>

It will mention in which mode HBase should be run. In the same file from the local file system, change the hbase.rootdir, your HDFS instance address, using the hdfs://// URI syntax. We are running HDFS on the localhost at port 9000.

Don't forget to do:

hdfs dfs -mkdir /hbase_data

mkdir /home/hadoop/zookeeper

mkdir /home/hadoop/hbase

mkdir /home/hadoop/hbase/HFiles

Check your hbase-site.xml add the following

<property>    
<name>hbase.rootdir</name>    
<value>hdfs://localhost:9000/hbase_data</value>
</property>

hbase1

For example, in my hbase-site.xml

<configuration>

   //Here you have to set the path where you want HBase to store its files.

   <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/home/hadoop/zookeeper</value>
   </property>
   <property>
      <name>hbase.cluster.distributed</name>
   <value>true</value>
   </property>
   <property>
      <name>hbase.rootdir</name>
      <value>hdfs://localhost:9000/hbase_data</value>
   </property>
   <property>
        <name>hbase.tmp.dir</name>
        <value>/home/hadoop/hbase/tmp</value>
        <description>Temporary directory on the local filesystem.</description>
   </property>
   <property> <name>hbase.master</name> <value>localhost:60000</value>
   </property>
       <property>
        <name>hbase.zookeeper.quorum</name>
        <value>localhost</value>
    </property>
        <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2182</value>
    </property>
   <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
  <property>
    <name>zookeeper.znode.parent</name>
    <value>/hbase</value>
  </property>
 <property>
    <name>timeline.metrics.service.operation.mode</name>
    <value>distributed</value>
  </property>
</configuration>

Starting HBase

After configuration is over, browse to HBase home folder and start HBase using the following command.

cd /home/hadoop/hbase/bin 
 ./start-hbase.sh

Note: Before starting HBase, make sure Hadoop is running.

hbase2

Checking the HBase Directory in HDFS HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the following command.

ls /home/hadoop/hbase/HFiles

total 48 
drwxrwxr-x 2 chantana chantana 4096 Feb 28 07:34 MasterProcWALs 
drwxrwxr-x 3 chantana chantana 4096 Feb 28 07:33 WALs 
drwxrwxr-x 2 chantana chantana 4096 Feb 27 23:50 archive 
drwxrwxr-x 2 chantana chantana 4096 Feb 27 23:50 corrupt 
drwxrwxr-x 4 chantana chantana 4096 Feb 27 23:50 data 
drwxrwxr-x 3 chantana chantana 4096 Feb 28 07:18 default 
drwxrwxr-x 4 chantana chantana 4096 Feb 27 23:50 hbase 
-rw-r--r-- 1 chantana chantana   42 Feb 27 23:50 hbase.id 
-rw-r--r-- 1 chantana chantana    7 Feb 27 23:50 hbase.version 
drwxrwxr-x 2 chantana chantana 4096 Feb 27 23:50 mobdir 
drwxrwxr-x 2 chantana chantana 4096 Feb 28 07:37 oldWALs 
drwx--x--x 2 chantana chantana 4096 Feb 27 23:50 staging

Misc

  • Starting and Stopping a Master

Using the “local-master-backup.sh” you can start up to 10 servers. Open the home folder of HBase, master and execute the following command to start it. Assume you are under /home/hadoop/hbase

 ./bin/local-master-backup.sh start 2 4

To kill a backup master, you need its process id, which will be stored in a file named /tmp/hbase-USER-X-master.pid you can kill the backup master using the following command.

cat /tmp/hbase-user-1-master.pid |xargs kill -9

  • Starting and Stopping RegionServers

You can run multiple region servers from a single system using the following command. Assume you are under /home/hadoop/hbase

./bin/local-regionservers.sh start 2 3

To stop a region server, use the following command.

./bin/local-regionservers.sh stop 3

hbase3

  • Starting HBaseShell

After Installing HBase successfully, you can start HBase Shell. Below given are the sequence of steps that are to be followed to start the HBase shell. Open the terminal, and login as super user.

Start Hadoop File System Browse through Hadoop home sbin folder and start Hadoop file system as shown below.

cd $HADOOP_HOME/sbin 
start-all.sh

Then, Start HBase Browse through the HBase root directory bin folder and start HBase.

cd /home/hadoop/hbase 
./bin/start-hbase.sh

Start HBase Master Server This will be the same directory. Start it as shown below.

./bin/local-master-backup.sh start 2 (number signifies specific server.)

Start Region Start the region server as shown below.

./bin/local-regionservers.sh start 3

You should see the following:

hadoop@bigdata:~/hbase$ ./bin/local-master-backup.sh start 2 4

running master, logging to /home/hadoop/hbase/logs/hbase-hadoop-2-master-bigdata.out

running master, logging to /home/hadoop/hbase/logs/hbase-hadoop-4-master-bigdata.out

hadoop@bigdata:~/hbase$ ./bin/start-hbase.sh

localhost: running zookeeper, logging to /home/hadoop/hbase/logs/hbase-hadoop-zookeeper-bigdata.out

running master, logging to /home/hadoop/hbase/logs/hbase-hadoop-master-bigdata.out

OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

: running regionserver, logging to /home/hadoop/hbase/logs/hbase-hadoop-regionserver-bigdata.out

: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
  • check your process, when started HBase
hadoop@bigdata:~/hbase$ jps

13489 HRegionServer

5363 ResourceManager

13366 HMaster

13302 HQuorumPeer

5529 NodeManager

5147 SecondaryNameNode

4731 NameNode

4892 DataNode

Then Start HBase Shell You can start HBase shell using the following command.

cd /home/chantana/hbase/bin 
./hbase shell

This will give you the HBase Shell Prompt as shown below.

2014-12-09 14:24:27,526 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri Nov 14 18:26:29 PST 2014  
hbase(main):001:0>

type quit to exit hbase shell.

  • HBase Web Interface

To access the web interface of HBase, type the following url in the browser.

http://localhost:16010

hbase4

for my machine, I type the following in the web browser.

(May need tunnel port , some case)

http://10.3.133.231:16010/

This interface lists your currently running Region servers, backup masters and HBase tables.

  • HBase Region servers and Backup Masters

HBase Tables

Setting Java Environment and Setting the Classpath. Set classpath for HBase libraries (lib folder in HBase) in it as shown below. This is to prevent the “class not found” exception while accessing the HBase using java API. We can also communicate with HBase using Java libraries, but before accessing HBase using Java API you need to set classpath for those libraries.

Before proceeding with programming, set the classpath to HBase libraries in .bashrc file. Open .bashrc in any of the editors as shown below.

 vi ~/.bashrc

add

export CLASSPATH=$CLASSPATH:/home/hadoop/hbase/lib/*

Procedure: Use HBase For the First Time

    • Connect to HBase.

Connect to your running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. In this example, some usage and version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a > character.

 ./bin/hbase 
shell hbase(main):001:0>

Display HBase Shell Help Text.

Type help and press Enter, to display some basic usage information for HBase Shell, as well as several example commands. Notice that table names, rows, columns all must be enclosed in quote characters.

    • Create a table

Use the create command to create a new table. You must specify the table name and the ColumnFamily name.

hbase(main):001:0> create 'test', 'cf'
 0 row(s) in 0.4170 seconds 
 => Hbase::Table - test

List Information About your Table

Use the list command to

hbase(main):002:0> list 'test' 
TABLE test 1 row(s) in 0.0180 seconds 
 => ["test"]

Put data into your table.

To put data into your table, use the put command.

hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1' 
0 row(s) in 0.0850 seconds  
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2' 
0 row(s) in 0.0110 seconds  
hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3' 
0 row(s) in 0.0100 seconds

Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.

Scan the table for all data at once.

One of the ways to get data from HBase is to scan. Use the scan command to scan the table for data. You can limit your scan, but for now, all data is fetched.

hbase(main):006:0> scan 'test' 
ROW                                      COLUMN+CELL  row1                                    column=cf:a, timestamp=1421762485768, value=value1  row2                                    column=cf:b, timestamp=1421762491785, value=value2  row3                                    column=cf:c, timestamp=1421762496210, value=value3 
3 row(s) in 0.0230 seconds

Get a single row of data.

To get a single row of data at a time, use the get command.

hbase(main):007:0> get 'test', 'row1' 
COLUMN                                   CELL  cf:a                                    timestamp=1421762485768, value=value1 
1 row(s) in 0.0350 seconds

Disable a table.

If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the disable command. You can re-enable it using the enable command.

hbase(main):008:0> disable 'test' 
0 row(s) in 1.1820 seconds  
hbase(main):009:0> enable 'test' 
0 row(s) in 0.1770 seconds

Disable the table again if you tested the enable command above:

hbase(main):010:0> disable 'test' 
0 row(s) in 1.1820 seconds

Drop the table.

To drop (delete) a table, use the drop command.

hbase(main):011:0> drop 'test' 
0 row(s) in 0.1370 seconds

Exit the HBase Shell.

To exit the HBase Shell and disconnect from your cluster, use the quit command. HBase is still running in the background.

Procedure: Stop HBase

In the same way that the bin/start-hbase.sh script is provided to conveniently start all HBase daemons, the bin/stop-hbase.shscript stops them.

 ./bin/stop-hbase.sh 
stopping hbase.................... $

Note: you can skip all Java Part.

After issuing the command, it can take several minutes for the processes to shut down. Use the jps to be sure that the HMaster and HRegionServer processes are shut down.

Create file 'InsertData.java'

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.Admin;

import org.apache.hadoop.hbase.client.Connection;

import org.apache.hadoop.hbase.client.ConnectionFactory;

public class InsertData {

    public static void main(String[] args) {

        Configuration conf = HBaseConfiguration.create();

        try {

            Connection conn = ConnectionFactory.createConnection(conf);

            Admin hAdmin = conn.getAdmin();

            HTableDescriptor hTableDesc = new HTableDescriptor(

                    TableName.valueOf("test3"));

            hTableDesc.addFamily(new HColumnDescriptor("cf"));

            hAdmin.createTable(hTableDesc);

            System.out.println("Table created Successfully...");

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

}

To run InsertData.java

javac -cp $(hbase classpath):$(hadoop classpath) InsertData.java
java -cp  $(hbase classpath):$(hadoop classpath) InsertData

Testing Hbase program

Create file  TestHbase.java
/*

 *      * Compile and run with:

 *           *javac -cp $(hbase classpath):$(hadoop classpath) TestHBase.java

 *                * java -cp  $(hbase classpath):$(hadoop classpath)  TestHBase

 *                     */

    import org.apache.hadoop.conf.Configuration;

    import org.apache.hadoop.hbase.*;

    import org.apache.hadoop.hbase.client.*;

    import org.apache.hadoop.hbase.util.*;

    public class TestHBase {

        public static void main(String[] args) throws Exception {

            Configuration conf = HBaseConfiguration.create();

            HBaseAdmin admin = new HBaseAdmin(conf);

            try {

            

             Connection conn = ConnectionFactory.createConnection(conf);

            Admin hAdmin = conn.getAdmin();

    TableName table1 = TableName.valueOf("test3");

    Table table = conn.getTable(table1);               

             Scan scan = new Scan();

                scan.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("a"));

                ResultScanner scanner = table.getScanner(scan);

                for (Result result = scanner.next(); result != null; result = scanner.next())

                System.out.println("Found row : " + result);

                scanner.close();

            } finally {

                admin.close();

            }

        }

    }

compile

javac -cp $(hbase classpath):$(hadoop classpath) TestHBase.java

Run:

 java -cp $(hbase classpath):$(hadoop classpath):. TestHBase 

Note: There is no data. Try to insert data using hbase shell .

hbase(main):015:0> put 'test3', 'row1', 'cf:a', 'value1'

Result:

...

Found row : keyvalues={row1/cf:a/1473598708060/Put/vlen=6/seqid=0}

Importing data:

First, Create test.csv, add

id,temp:in,temp:out

555,50,30

call:

hbase shell

Then create

hbase(main):001:0> create 'test2','id', 'temp'

and exit to shell and copy to hdfs

hadoop dfs -mkdir /data

hadoop dfs -copyFromLocal test.csv /data/test.csv

Import data at shell prompt:

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,temp:in,temp:out" test2 hdfs://localhost:9000/data/test.csv
...

2016-09-11 22:34:15,273 INFO  [main] mapreduce.Job: Job job_1473575783808_0006 completed successfully

2016-09-11 22:34:15,409 INFO  [main] mapreduce.Job: Counters: 31

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=136779

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=130

HDFS: Number of bytes written=0

HDFS: Number of read operations=2

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters 

Launched map tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=2706

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=2706

Total vcore-seconds taken by all map tasks=2706

Total megabyte-seconds taken by all map tasks=2770944

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=100

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=60

CPU time spent (ms)=1880

Physical memory (bytes) snapshot=186990592

Virtual memory (bytes) snapshot=2193063936

Total committed heap usage (bytes)=143130624

ImportTsv

Bad Lines=0

File Input Format Counters 

Bytes Read=30

File Output Format Counters 

Bytes Written=0

call :

hbase shell
hbase(main):001:0> scan test2

ROW                                    COLUMN+CELL                                                                                                  

 555                                   column=temp:in, timestamp=1473660169821, value=50                                                            

 555                                   column=temp:out, timestamp=1473660169821, value=30                                                           

 id                                    column=temp:in, timestamp=1473660169821, value=temp:in                                                       

 id                                    column=temp:out, timestamp=1473660169821, value=temp:out                                                     

2 row(s) in 0.3270 seconds

See more example: https://github.com/khodeprasad/java-hbase/tree/master/src/main/java/com/khodeprasad/hbase

https://www.tutorialspoint.com/hbase/hbase_create_data.htm

(https://community.hortonworks.com/articles/4942/import-csv-data-into-hbase-using-importtsv.html)

Try with the following data: Household_power....

(http://archive.ics.uci.edu/ml/)

Try to organize the data into columns and import it.

Try to make useful query from it.

Downloads

https://github.com/cchantra/bigdata.github.io/tree/master/hbase

Try happybase For hbase python

To create and scan table

Notebook2

To import data from csv

Notebook

References

http://www.paul4llen.com/installing-apache-hbase-on-centos-6/

http://www.tutorialspoint.com/hbase/hbase_installation.htm

http://hbase.apache.org/book.html#quickstart

https://hbase.apache.org/book.html

http://www.cloudera.com/resources/training/intorduction-hbase-todd-lipcon.html

⚠️ **GitHub.com Fallback** ⚠️