HBase - cchantra/bigdata.github.io GitHub Wiki
Assume you have setup Java and Hadoop already.
We can install HBase in any of the three modes: Standalone mode, Pseudo Distributed mode, and Fully Distributed mode.
- Download the latest stable version of HBase form https://apache.mirror.gtcomm.net/hbase/
using "wget" command, and extract it using the tar "zxvf" command. See the following command. E.g.
wget http://archive.apache.org/dist/hbase/1.4.13/hbase-1.4.13-bin.tar.gz
tar xvf hbase-1.4.13-bin.tar.gz
mv hbase-1.4.13-bin hbase
Shift to super user mode and move the HBase folder to /usr/local as shown below.
- Configuring HBase in Standalone Mode
Consider hbase-env.sh
Before proceeding with HBase, you have to edit the following files and configure HBase.
- Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME environment variable and change the existing path to your current JAVA_HOME variable as shown below.
cd /home/hadoop/hbase/conf
vi hbase-env.sh
Then add
export JAVA_HOME="$(jrunscript -e 'java.lang.System.out.println(java.lang.System.getProperty("java.home"));')"
export HBASE_HOME=/home/hadoop/hbase
export PATH=$PATH:/home/hadoop/hbase/bin
export CLASSPATH=$CLASSPATH:/home/hadoop/hbase/lib
export HBASE_MANAGES_ZK=true
Or you can add these to .bashrc
This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with your current value as shown below.
Consider hbase-site.xml
This is the main configuration file of HBase. Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files, open the hbase-site.xml file as shown below.
cd /home/hadoop/hbase/conf
vi hbase-site.xml
Inside the hbase-site.xml file, you will find the and tags. Within them, set the HBase directory under the property key with the name "hbase.rootdir" as shown below.
<configuration>
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/hbase/HFiles</value>
</property>
//Here you have to set the path where you want HBase to store its built in zookeeper files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
With this, the HBase installation and configuration part is successfully complete. We can start HBase by using start-hbase.sh script provided in the bin folder of HBase. For that, open HBase Home Folder and run HBase start script as shown below.
cd /home/hadoop/hbase/bin
./start-hbase.sh
You 'll see,
starting master, logging to /usr/local/HBase/bin/../logs/hbase-tpmaster-localhost.localdomain.out
If everything goes well, when you try to run HBase start script, it will prompt you a message saying that HBase has started.
CONFIGURING HBASE
Let us now check how HBase is installed in pseudo-distributed mode.
Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote system and make sure they are running. Stop HBase if it is running.
Consdier hbase-site.xml
Edit hbase-site.xml file to add the following properties.
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
It will mention in which mode HBase should be run. In the same file from the local file system, change the hbase.rootdir, your HDFS instance address, using the hdfs://// URI syntax. We are running HDFS on the localhost at port 9000.
Don't forget to do:
hdfs dfs -mkdir /hbase_data
mkdir /home/hadoop/zookeeper
mkdir /home/hadoop/hbase
mkdir /home/hadoop/hbase/HFiles
Check your hbase-site.xml
add the following
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase_data</value>
</property>
For example, in my hbase-site.xml
<configuration>
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase_data</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hadoop/hbase/tmp</value>
<description>Temporary directory on the local filesystem.</description>
</property>
<property> <name>hbase.master</name> <value>localhost:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2182</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>timeline.metrics.service.operation.mode</name>
<value>distributed</value>
</property>
</configuration>
Starting HBase
After configuration is over, browse to HBase home folder and start HBase using the following command.
cd /home/hadoop/hbase/bin
./start-hbase.sh
Note: Before starting HBase, make sure Hadoop is running.
Checking the HBase Directory in HDFS HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the following command.
ls /home/hadoop/hbase/HFiles
total 48
drwxrwxr-x 2 chantana chantana 4096 Feb 28 07:34 MasterProcWALs
drwxrwxr-x 3 chantana chantana 4096 Feb 28 07:33 WALs
drwxrwxr-x 2 chantana chantana 4096 Feb 27 23:50 archive
drwxrwxr-x 2 chantana chantana 4096 Feb 27 23:50 corrupt
drwxrwxr-x 4 chantana chantana 4096 Feb 27 23:50 data
drwxrwxr-x 3 chantana chantana 4096 Feb 28 07:18 default
drwxrwxr-x 4 chantana chantana 4096 Feb 27 23:50 hbase
-rw-r--r-- 1 chantana chantana 42 Feb 27 23:50 hbase.id
-rw-r--r-- 1 chantana chantana 7 Feb 27 23:50 hbase.version
drwxrwxr-x 2 chantana chantana 4096 Feb 27 23:50 mobdir
drwxrwxr-x 2 chantana chantana 4096 Feb 28 07:37 oldWALs
drwx--x--x 2 chantana chantana 4096 Feb 27 23:50 staging
- Starting and Stopping a Master
Using the “local-master-backup.sh” you can start up to 10 servers. Open the home folder of HBase, master and execute the following command to start it. Assume you are under /home/hadoop/hbase
./bin/local-master-backup.sh start 2 4
To kill a backup master, you need its process id, which will be stored in a file named /tmp/hbase-USER-X-master.pid
you can kill the backup master using the following command.
cat /tmp/hbase-user-1-master.pid |xargs kill -9
- Starting and Stopping RegionServers
You can run multiple region servers from a single system using the following command. Assume you are under /home/hadoop/hbase
./bin/local-regionservers.sh start 2 3
To stop a region server, use the following command.
./bin/local-regionservers.sh stop 3
- Starting HBaseShell
After Installing HBase successfully, you can start HBase Shell. Below given are the sequence of steps that are to be followed to start the HBase shell. Open the terminal, and login as super user.
Start Hadoop File System Browse through Hadoop home sbin folder and start Hadoop file system as shown below.
cd $HADOOP_HOME/sbin
start-all.sh
Then, Start HBase Browse through the HBase root directory bin folder and start HBase.
cd /home/hadoop/hbase
./bin/start-hbase.sh
Start HBase Master Server This will be the same directory. Start it as shown below.
./bin/local-master-backup.sh start 2 (number signifies specific server.)
Start Region Start the region server as shown below.
./bin/local-regionservers.sh start 3
You should see the following:
hadoop@bigdata:~/hbase$ ./bin/local-master-backup.sh start 2 4
running master, logging to /home/hadoop/hbase/logs/hbase-hadoop-2-master-bigdata.out
running master, logging to /home/hadoop/hbase/logs/hbase-hadoop-4-master-bigdata.out
hadoop@bigdata:~/hbase$ ./bin/start-hbase.sh
localhost: running zookeeper, logging to /home/hadoop/hbase/logs/hbase-hadoop-zookeeper-bigdata.out
running master, logging to /home/hadoop/hbase/logs/hbase-hadoop-master-bigdata.out
OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
: running regionserver, logging to /home/hadoop/hbase/logs/hbase-hadoop-regionserver-bigdata.out
: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
- check your process, when started HBase
hadoop@bigdata:~/hbase$ jps
13489 HRegionServer
5363 ResourceManager
13366 HMaster
13302 HQuorumPeer
5529 NodeManager
5147 SecondaryNameNode
4731 NameNode
4892 DataNode
Then Start HBase Shell You can start HBase shell using the following command.
cd /home/chantana/hbase/bin
./hbase shell
This will give you the HBase Shell Prompt as shown below.
2014-12-09 14:24:27,526 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri Nov 14 18:26:29 PST 2014
hbase(main):001:0>
type quit to exit hbase shell.
- HBase Web Interface
To access the web interface of HBase, type the following url in the browser.
for my machine, I type the following in the web browser.
(May need tunnel port , some case)
This interface lists your currently running Region servers, backup masters and HBase tables.
- HBase Region servers and Backup Masters
HBase Tables
Setting Java Environment and Setting the Classpath. Set classpath for HBase libraries (lib folder in HBase) in it as shown below. This is to prevent the “class not found” exception while accessing the HBase using java API. We can also communicate with HBase using Java libraries, but before accessing HBase using Java API you need to set classpath for those libraries.
Before proceeding with programming, set the classpath to HBase libraries in .bashrc file. Open .bashrc in any of the editors as shown below.
vi ~/.bashrc
add
export CLASSPATH=$CLASSPATH:/home/hadoop/hbase/lib/*
- Usage Example: (http://hbase.apache.org/book.html#quickstart)
Procedure: Use HBase For the First Time
-
- Connect to HBase.
Connect to your running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. In this example, some usage and version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a > character.
./bin/hbase
shell hbase(main):001:0>
Display HBase Shell Help Text.
Type help and press Enter, to display some basic usage information for HBase Shell, as well as several example commands. Notice that table names, rows, columns all must be enclosed in quote characters.
-
- Create a table
Use the create command to create a new table. You must specify the table name and the ColumnFamily name.
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds
=> Hbase::Table - test
List Information About your Table
Use the list command to
hbase(main):002:0> list 'test'
TABLE test 1 row(s) in 0.0180 seconds
=> ["test"]
Put data into your table.
To put data into your table, use the put command.
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds
hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds
Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.
Scan the table for all data at once.
One of the ways to get data from HBase is to scan. Use the scan command to scan the table for data. You can limit your scan, but for now, all data is fetched.
hbase(main):006:0> scan 'test'
ROW COLUMN+CELL row1 column=cf:a, timestamp=1421762485768, value=value1 row2 column=cf:b, timestamp=1421762491785, value=value2 row3 column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds
Get a single row of data.
To get a single row of data at a time, use the get command.
hbase(main):007:0> get 'test', 'row1'
COLUMN CELL cf:a timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds
Disable a table.
If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the disable command. You can re-enable it using the enable command.
hbase(main):008:0> disable 'test'
0 row(s) in 1.1820 seconds
hbase(main):009:0> enable 'test'
0 row(s) in 0.1770 seconds
Disable the table again if you tested the enable command above:
hbase(main):010:0> disable 'test'
0 row(s) in 1.1820 seconds
Drop the table.
To drop (delete) a table, use the drop command.
hbase(main):011:0> drop 'test'
0 row(s) in 0.1370 seconds
Exit the HBase Shell.
To exit the HBase Shell and disconnect from your cluster, use the quit command. HBase is still running in the background.
Procedure: Stop HBase
In the same way that the bin/start-hbase.sh script is provided to conveniently start all HBase daemons, the bin/stop-hbase.shscript stops them.
./bin/stop-hbase.sh
stopping hbase.................... $
Note: you can skip all Java Part.
After issuing the command, it can take several minutes for the processes to shut down. Use the jps to be sure that the HMaster and HRegionServer processes are shut down.
Create file 'InsertData.java'
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
public class InsertData {
public static void main(String[] args) {
Configuration conf = HBaseConfiguration.create();
try {
Connection conn = ConnectionFactory.createConnection(conf);
Admin hAdmin = conn.getAdmin();
HTableDescriptor hTableDesc = new HTableDescriptor(
TableName.valueOf("test3"));
hTableDesc.addFamily(new HColumnDescriptor("cf"));
hAdmin.createTable(hTableDesc);
System.out.println("Table created Successfully...");
} catch (Exception e) {
e.printStackTrace();
}
}
}
To run InsertData.java
javac -cp $(hbase classpath):$(hadoop classpath) InsertData.java
java -cp $(hbase classpath):$(hadoop classpath) InsertData
Testing Hbase program
Create file TestHbase.java
/*
* * Compile and run with:
* *javac -cp $(hbase classpath):$(hadoop classpath) TestHBase.java
* * java -cp $(hbase classpath):$(hadoop classpath) TestHBase
* */
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.*;
public class TestHBase {
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
try {
Connection conn = ConnectionFactory.createConnection(conf);
Admin hAdmin = conn.getAdmin();
TableName table1 = TableName.valueOf("test3");
Table table = conn.getTable(table1);
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("a"));
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next())
System.out.println("Found row : " + result);
scanner.close();
} finally {
admin.close();
}
}
}
compile
javac -cp $(hbase classpath):$(hadoop classpath) TestHBase.java
Run:
java -cp $(hbase classpath):$(hadoop classpath):. TestHBase
Note: There is no data. Try to insert data using hbase shell .
hbase(main):015:0> put 'test3', 'row1', 'cf:a', 'value1'
Result:
...
Found row : keyvalues={row1/cf:a/1473598708060/Put/vlen=6/seqid=0}
Importing data:
First, Create test.csv, add
id,temp:in,temp:out
555,50,30
call:
hbase shell
Then create
hbase(main):001:0> create 'test2','id', 'temp'
and exit to shell and copy to hdfs
hadoop dfs -mkdir /data
hadoop dfs -copyFromLocal test.csv /data/test.csv
Import data at shell prompt:
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="HBASE_ROW_KEY,temp:in,temp:out" test2 hdfs://localhost:9000/data/test.csv
...
2016-09-11 22:34:15,273 INFO [main] mapreduce.Job: Job job_1473575783808_0006 completed successfully
2016-09-11 22:34:15,409 INFO [main] mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=136779
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=130
HDFS: Number of bytes written=0
HDFS: Number of read operations=2
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2706
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2706
Total vcore-seconds taken by all map tasks=2706
Total megabyte-seconds taken by all map tasks=2770944
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=100
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=60
CPU time spent (ms)=1880
Physical memory (bytes) snapshot=186990592
Virtual memory (bytes) snapshot=2193063936
Total committed heap usage (bytes)=143130624
ImportTsv
Bad Lines=0
File Input Format Counters
Bytes Read=30
File Output Format Counters
Bytes Written=0
call :
hbase shell
hbase(main):001:0> scan test2
ROW COLUMN+CELL
555 column=temp:in, timestamp=1473660169821, value=50
555 column=temp:out, timestamp=1473660169821, value=30
id column=temp:in, timestamp=1473660169821, value=temp:in
id column=temp:out, timestamp=1473660169821, value=temp:out
2 row(s) in 0.3270 seconds
See more example: https://github.com/khodeprasad/java-hbase/tree/master/src/main/java/com/khodeprasad/hbase
https://www.tutorialspoint.com/hbase/hbase_create_data.htm
(https://community.hortonworks.com/articles/4942/import-csv-data-into-hbase-using-importtsv.html)
Try with the following data: Household_power....
(http://archive.ics.uci.edu/ml/)
Try to organize the data into columns and import it.
Try to make useful query from it.
https://github.com/cchantra/bigdata.github.io/tree/master/hbase
To create and scan table
To import data from csv
http://www.paul4llen.com/installing-apache-hbase-on-centos-6/
http://www.tutorialspoint.com/hbase/hbase_installation.htm
http://hbase.apache.org/book.html#quickstart
https://hbase.apache.org/book.html
http://www.cloudera.com/resources/training/intorduction-hbase-todd-lipcon.html