QA HBase MOB - cto-bdt-qa/bdt-qa GitHub Wiki

Introduction

This is an optimized solution for storing records with MOB(Medium Object) files into HBase

MOB Definition

MOB:  Medium object.  It usually refers to BMOB(binary medium object) and CMOB (character medium object). It can be a PDF document, Word document, image, multimedia object, etc. Unlike typical records, MOB can typically be several hundred KB to several MB in size. (100KB ~ 5MB).

MOB Characteristics

MOB data is typically stored together with its metadata. For example, in ITS(Intelligent Transportation System) field, the metadata (car speed, color, direction, plate number, etc.) of the picture taken by lane camera is extracted before storing MOB (pictures). And the metadata are stored together with MOB.

Write characteristics:

  • Write performance is quite critical. MOBs and their metadata need to be stored efficiently.
  • The total data size is quite big.
  • MOBs are seldom updated.

Read characteristics:

  • The MOBs are accessed much less than their corresponding metadata. Typically their metadata is accessed for analysis; MOBs are used as archives, and accessed only when user explicitly request them.
  • MOBs are typically accessed in a random way. User typically doesn't scan all MOBs at one batch.

MOB Design Goals

  • High performance: (1) Write: Stable low latency and high throughput. Minimize the MOB impact to the HBase when split, compaction etc. (2) Read: Low read latency and good concurrent read performance. Fast scan on the metadata, and fast random read on the MOB data.
  • Consistency for MOB data read/write.
  • Transparency during reading and writing against the MOB data.

Related Document

Document Description
HBASE-11339

JIRA entry for MOB feature in Apache HBase project


Design Doc

Design Document for MOB, which is summitted to JIRA entry HBASE-11339

Current Dev Branch

Dev Branch on GitHub (intel-hadoop account): https://github.com/intel-hadoop/HBase-LOB

Schedule

QA Start Date QA Plan Ready Functionality Test Case Ready Performance Test Case Ready QA End Date





Test Category/Scenario Summary

Test Category Test Scenario Description Planed Test Case No. Test  Case Dev Status Test Result
Functionality Test

HTable APITest (CRUD, Batch, Scan etc.)

Test MOB functionality via HTable API:

  • CRUD (Put, Get, Delete, Append, Increment etc.)
  • Batch
  • Scan (Scan, Scan With Filter)

, to validate whether there is any conformance issue when MOB is used.

 14 - -

HBase Administration Operation Test

Test MOB functionality via HBase Administration operations:

  • Table Create/Drop/Enable/Disable
  • Alter Table Schema
  • Compaction
  • Split

, to validate whether there is any conformance issue when MOB is used.

 10 - -

MOB Housekeeping Test

Validate the correctness of the following 2 housekeeping ways which MOB supports:

  • Delete expired MOB files according to the TTL in the column family (triggered by compaction)
  • Use sweeper tool to clean the unused and expired MOB data (triggered by user)
 8 - -
Performance Test MOB data write performance

Collect the performance data of MOB table data write, for different MOB data sizes (e.g. 100K, 3M, 5M)

Compare it with non-MOB table data write.

 4 - -

MOB data random read performance

Collect the performance data of MOB table data random read, for different MOB data sizes (e.g. 100K, 3M, 5M)

Compare it with non-MOB table data random read.

 4 - -

Metadata data scan

Collect the performance data of metadata data scan, for different corresponding MOB data sizes (e.g. 100K, 3M, 5M)

Compare it with non-MOB table data scan.

4 - -
Failure Test HBase Region Server Down

Validate the correctness of all the functionality test cases, when there are 1~2 HBase region servers down.

(Suppose data replication is 3)

 32 - -

HDFS Datanode Down

Validate the correctness of all the functionality test cases, when there are 1~2 HDFS datanode servers down.

(Suppose data replication is 3)

 32 - -

Functionality Test

HTable API Test

# Test Suite Test Case

Description

 Covered HTable Operation Test Result
1

CRUD Test

testPutDataBasic

Validate the basic functionality for MOB Put/Get Put, Get -
2

testPutDataTwice

Validate the functionality for MOB Put/Get,  by put on one row twice Put, Get -
3

testCheckAndPut

Validate the functionality for MOB CheckAndPut:  Put one row first, use checkAndPut to update it, and then check the updated result.

CheckAntPut, Get -
4

testCheckAndDelete

Validate the functionality for MOB CheckAndDelete:  Put one row first, use checkAndDelete to delete it, and then check the result. CheckAndDelete, Get -
5

testAppend

Validate the functionality for MOB Append:  Put one row first, use Append to update it, and then check the updated result. Append, Get -
6

testMutateRowPut

Validate the functionality for MOB MutateRow(Put):   MutateRowAdd, Get -
7

testMutateRowDelete

Validate the functionality for MOB MutateRow(Delete): MutateRowDelete, Get -
8

testIncrement

This is just a compatibility test, to see whether Increment will be impacted when some column family enabled MOB

Increment, Get -
9

testIncrementColumnValueWithLog

This is just a compatibility test, to see whether IncrementColumnValue will be impacted when some column family enabled MOB

incrementColumnValue, Get -
10

testIncrementColumnValueOverflow

This is just a compatibility test, to see whether IncrementColumnValue will be impacted when some column family enabled MOB

incrementColumnValue, Get -
11 Batch Test

testBatch

Validate the functionality for MOB Put/Get/Append/Increment etc. by calling HTable batch api Batch (Put, Append, Increment, Get etc.) -
12 Scan Test

testScanByQualifier

Validate the functionality for MOB Scan, by Column Qualifier Put, Scan -
13

testScanWithFilter

Validate the functionality for MOB Scan, by setting a specific filter for scan Put, Scan -
14

testPutAndScanForMultipleMOBs

Perform multiple put operations to insert multiple MOBs to different rows, and then use scan to query the result. Put, Scan -

HBase Administration Operation Test

# Test Suite  Test Case Test Result
1 Table Create/Drop/Enable/Disable Create Table Normally -
2
Drop Table Normally -
3
Create Table Twice -
4
Drop Table Twice -
5
Re-create Table (Create -> Drop -> Create) -
6
Enable/Disable Table Normally -
7 Alter Table Schema

Column Family Modification

-
8 Compaction

Major Compaction

-
9

Minor Compaction

-
10 Split

Table Split

-

Housekeeping Test

# Test Suite Test Case Test Result
1

Delete expired MOB files according to the TTL

MOB files are expired

-
2

MOB files are actually NOT expired

-
3 Use sweeper tool to clean the unused and expired MOB data
 

Start sweeper tool to do housekeeping (MOB files are expired)

-
4

Start sweeper tool to do housekeeping (MOB files are actually NOT expired)

-
5

Perform update operation during sweeper tool running

-
6

Perform delete operation during sweeper tool running

-
7

Perform delete & major compaction during sweeper tool running

-
8

Perform major compaction during sweeper tool running

-
9

Start sweeper tool during major compaction

-
10

There is no MOB file unused or expired

-

Performance Test

Methodology:  Develop test plugins and add them into SOAK reliability/performance test framework.

The following test scenarios will be covered:

MOB Data Write

# Test Case Throughput
1

MOB (Data Size: 100K)

-
2

MOB (Data Size: 3M)

-
3

MOB (Data Size: 5M)

-
4

non-MOB

-

MOB Data Random Read

# Test Case Latency       
1

MOB (Data Size: 100K)

-
2

MOB (Data Size: 3M)

-
3

MOB (Data Size: 5M)

-
4

non-MOB

-

Metadata Data Scan

# Test Case Latency       
1

MOB (Data Size: 100K)

-
2

MOB (Data Size: 3M)

-
3

MOB (Data Size: 5M)

-
4

non-MOB

-

Failure Test

Failure test will validate the correctness of all the MOB functionality test cases, when some non-critical service is down on cluster node. (e.g. )

# Test Scenario Functionality Test Cases to Validate Test Result
1 HBase Region Server Down

 HTable CRUD Test

-
2

 HBase Administration Operation Test

-
3

 MOB Housekeeping Test

-
4 HDFS Datanode Server Down

 HTable CRUD Test

-
5

 HBase Administration Operation Test

-
6

 MOB Housekeeping Test

-

Issue Tracking

⚠️ **GitHub.com Fallback** ⚠️