Communicating with the data warehouse and creating custom hive views - servinglynk/hslynk-open-source-docs GitHub Wiki

Introduction

Below is a brief overview of how HSLynk creates custom Hive/Impala views based on the HMIS, CES, and general human services data for each customer.

Technical Overview

The primary technology behind the data warehouse is Hadoop. We currently use Cloudera Hadoop cluster with Ldap sentry authentication. Essentially the data is stored in HBASE (HDFS) and we perform real-time analytics on the data loaded via creating external tables on Hive/Impala.

Custom Hive/Impala View for HSLynk and CES

We have the following projects which contain code specific to populating data our custom Hsynk and CES views. Two of the frequently used views like VI-SPDAT and CES Active List are here. https://github.com/servinglynk/hslynk-open-source/tree/master/sync-general

Conclusion

Although we use impala to populate the data to HBASE. We usually create the views on Hive because Impala and Hive share the same metadata.