Insert | Update | Delete | Merge - ignacio-alorre/Hive GitHub Wiki

Operations

Note: Administrators must use a transaction manager that supports ACID and the ORC file format to use transactions. See Hive Transactions for information about configuring other properties related to use ACID-based transactions. Here how to Create Hive ACID Transaction Table

  • INSERT: Writes data to Apache Hive from values provided in SQL statements
  • UPDATE: Modifies values already written to Hive
  • DELETE: Deletes values already written to Hive
  • MERGE: Streamlines UPDATEs, DELETEs, and change data capture operations by drawing on coexisting tables.

All four statements support auto-commit, which means that each statement is a separate transaction that is automatically committed after the SQL statement is executed.

The INSERT ... VALUES, UPDATE, and DELETE statements require the following property values in the hive-site.xml configuration file:

hive.enforce.bucketing = true
hive.exec.dynamic.partition.mode = nonstrict

INSERT

  • Supports adding multiple values into table columns directly from SQL statements.
  • A valid INSERT statement must provide values for each column in the table. However, users may assign values to columns for which they do not want to assign a value.
  • In addition, the PARTITION clause must be included in the DML.

Syntax

INSERT INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] VALUES values_row [, values_row...]

Examples

INSERT INTO TABLE D_customer_new VALUES 
(8, 'Helcur', 's3'), 
(1, 'Abbel', 's3'),
(2, 'Bernard', 's2'),
(9, 'Jackeline', 's9')

Examples with partitions

INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23') VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null)
INSERT INTO TABLE pageviews PARTITION (datestamp) VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21'); 

UPDATE

  • Depending on the condition specified in the optional WHERE clause, an UPDATE statement may affect every row in a table
  • You must have both the SELECT and UPDATE privileges to use this statement.

Syntax

UPDATE tablename SET column = value [, column = value ...] [WHERE expression]; 
  • Limitations of the UPDATE statement:
    • The expression in the WHERE clause must be an expression supported by a Hive SELECT clause
    • Partition and bucket columns cannot be updated
    • Query vectorization is automatically disabled for UPDATE statements. However, updated tables can still be queried using vectorization.
    • Subqueries are not allowed on the right side of the SET statement.

Example

UPDATE D_customer_new SET segment = 's7' WHERE segment = 's3';

DELETE

  • Used to delete data already written to Apache Hive

Syntax

DELETE FROM tablename [WHERE expression]; 
  • Limitations of the DELETE statement:
    • Query vectorization is automatically disabled for the DELETE operation. However, tables with deleted data can still be queried using vectorization.

Example

DELETE FROM D_customer_new WHERE segment = 's3';

MERGE

  • Use the MERGE statement to efficiently perform record-level INSERT, UPDATE, and DELETE operations within Hive tables.

Example

MERGE INTO D_customer
using ( select * from D_customer_new) updt
on updt.id = D_customer.id
when matched then update set name = updt.name, segment = sub.segment
when not matched then insert values (updt.id, updt.name, updt.segment);

Source