1 Introduction - ignacio-alorre/Hive GitHub Wiki
What is Hive?
Hive is a data warehouse infrastructure tool to process structured and semi-structured data in Hadoop. It resides on top of Hadoop and it is mostly used for querying and analyzing large datasets. Initially, you have to write complex Map-Reduce jobs, but now you just need to submit merely SQL-like queries. Hive use language called HiveQL (HQL), which is similar to SQL. HiveQL automatically translates SQL-like queries into MapReduce jobs.
What is not?
- A relational database
- A design for OnLine Transaction Processing (OLTP)
- A language for real-time queries and row-level updates
Features of Hive
- It stores schema in a database and processed data into HDFS
- It is designed for OLAP
- It provides SQL type language for querying called HiveQL or HQL
- It is familiar, fast, scalable and extensible
Sources