•Read about Phoenix (ref below) which is a SQL interface to HBase
•It is not required for your solution, since HBase is consumed by microservices. Use of HBase native API is simpler solution.
•but .. Phoenix presents a SQL interface that may be more familiar and preferable to developers than Java HBase API
•if users ever query HBase directly, Phoenix is preferable (SQL from a SQL client)
•it is worth evaluating its potential use

Phase 2: POC

•design your keys against actual data and query patterns
•load actual datasets (key is first column of dataset) Note: for large volumes of initial load (>50-100GB), consider bulk load as per below
•consider multiple tables, each with same data but different key design (each table and key design is for specific query pattern)
use CLI API to do sanity tests
•integrate native Java API (or Phoenix JDBC) in microservices framework for final integration
•optimize HBase configs if needed (see refs)

**Phase 3: **

Load data Hive to Hbase using Hive SQL

**Long Key vs Secondry index :+1: **

Every column value is associated with the key itself so long keys result in higher storage on disk and in memory than shorter keys. During insert, larger keys will take up more space in the memstore which means it will be flushed and the resulting files compacted more frequently than with shorter keys. Overall, longer keys will occupy more space in hdfs.

If all of this is true, your impact from long keys will be felt at the initial load and not thereafter.
In this case, you should use bulk load via Map-Reduce https://phoenix.apache.org/bulk_dataload.html and test in development. To be safe, bulk load your data in batches and use salting. Also be sure to use DATA_BLOCK_ENCODING=‘FAST_DIFF’. See this reference: https://phoenix.apache.org/tuning_guide.html

**Primary and secondry index **

Code example:

REFERENCES

Best single source Worth getting to best get up to speed with HBase! HBase: The Definitive Guide

HBase overviews http://hortonworks.com/webinar/hbase-for-mission-critical-applications/ https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.3/bk_data-access/content/ch05.html

Row key design (subsecond responses require proper keys, which can be composite) https://community.hortonworks.com/questions/101309/hbase-schema-design-for-complex-data.html https://www.youtube.com/watch?v=_HLoH_PgrLk

Acid Semantics (HBase locks at the row level) https://hbase.apache.org/acid-semantics.html

Java API http://hbase.apache.org/book.html#jdo http://www.baeldung.com/hbase

Shell API http://hbase.apache.org/book.html#shell

Bulk Loading Data https://community.hortonworks.com/articles/4942/import-csv-data-into-hbase-using-importtsv.html https://hbase.apache.org/book.html#importtsv (sections 145.11 - 145.12)

Phoenix SQL interface https://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html https://www.slideshare.net/Hadoop_Summit/w-145p230-ataylorv2