Hbase - noonecare/opensourcebigdatatools GitHub Wiki
Application
海量数据,事先不知道 Schema,需要做增删改查,要求实时性。
支持 CRUD(create, read, update, delete)
HBASE Architecture
- Master Node
Master Node assign regions and load balancing。 对于 hbase master node 是唯一的。
- RegionServers
Table 实际是保存在 RegionServers 中的 Region 里的。 每个 Region 包含了 HLog(代表日志), memstore(缓存) and HStore(也称hfile,储存数据,保存在 hdfs 上) 三部分。hfile 是 key-value map。
Storage Model
DataModel
Table are stored by rows.
During table creation, column families should be defined.
- each family consists of any number of columns.
- each column consists of any number of versions.
- Columns only exists when inserted, NULLs are free.
- Columns in a family are sorted and stored stored together.
Everything except table names are stored as byte arrays.
A row value is identified by a row key, a cloumn family with columns, and a timestamp with version.
The starting identifier is a row key.
Column families are associated with column qualifiers.
Each row has a timestamp and an associated value.
Data Storage
Interface
- Java Client
- Non-Java Clients: Thrifts or REST Server.
- HBase Shell
- Hive, Pig, HCatalog, and Hue.