Hbase - noonecare/opensourcebigdatatools GitHub Wiki

hbase features

Application

海量数据,事先不知道 Schema,需要做增删改查,要求实时性。

支持 CRUD(create, read, update, delete)

HBASE Architecture

  • Master Node

Master Node assign regions and load balancing。 对于 hbase master node 是唯一的。

  • RegionServers

Table 实际是保存在 RegionServers 中的 Region 里的。 每个 Region 包含了 HLog(代表日志), memstore(缓存) and HStore(也称hfile,储存数据,保存在 hdfs 上) 三部分。hfile 是 key-value map。

Storage Model

DataModel

Table are stored by rows.

During table creation, column families should be defined.

  • each family consists of any number of columns.
  • each column consists of any number of versions.
  • Columns only exists when inserted, NULLs are free.
  • Columns in a family are sorted and stored stored together.

Everything except table names are stored as byte arrays.

A row value is identified by a row key, a cloumn family with columns, and a timestamp with version.

The starting identifier is a row key.

Column families are associated with column qualifiers.

Each row has a timestamp and an associated value.

Data Storage

Interface

  • Java Client
  • Non-Java Clients: Thrifts or REST Server.
  • HBase Shell
  • Hive, Pig, HCatalog, and Hue.