1 Introduction - ignacio-alorre/Hive GitHub Wiki

What is Hive?

Hive is a data warehouse infrastructure tool to process structured and semi-structured data in Hadoop. It resides on top of Hadoop and it is mostly used for querying and analyzing large datasets. Initially, you have to write complex Map-Reduce jobs, but now you just need to submit merely SQL-like queries. Hive use language called HiveQL (HQL), which is similar to SQL. HiveQL automatically translates SQL-like queries into MapReduce jobs.

What is not?

  • A relational database
  • A design for OnLine Transaction Processing (OLTP)
  • A language for real-time queries and row-level updates

Features of Hive

  • It stores schema in a database and processed data into HDFS
  • It is designed for OLAP
  • It provides SQL type language for querying called HiveQL or HQL
  • It is familiar, fast, scalable and extensible

Sources