Communicating with the data warehouse and creating custom hive views - servinglynk/hslynk-open-source-docs GitHub Wiki
Introduction
Below is a brief overview of how HSLynk creates custom Hive/Impala views based on the HMIS, CES, and general human services data for each customer.
Technical Overview
The primary technology behind the data warehouse is Hadoop. We currently use Cloudera Hadoop cluster with Ldap sentry authentication. Essentially the data is stored in HBASE (HDFS) and we perform real-time analytics on the data loaded via creating external tables on Hive/Impala.
Custom Hive/Impala View for HSLynk and CES
We have the following projects which contain code specific to populating data our custom Hsynk and CES views. Two of the frequently used views like VI-SPDAT and CES Active List are here. https://github.com/servinglynk/hslynk-open-source/tree/master/sync-general
Conclusion
Although we use impala to populate the data to HBASE. We usually create the views on Hive because Impala and Hive share the same metadata.