Insert | Update | Delete | Merge - ignacio-alorre/Hive GitHub Wiki
Operations
Note: Administrators must use a transaction manager that supports ACID and the ORC file format to use transactions. See Hive Transactions for information about configuring other properties related to use ACID-based transactions. Here how to Create Hive ACID Transaction Table
- INSERT: Writes data to Apache Hive from values provided in SQL statements
- UPDATE: Modifies values already written to Hive
- DELETE: Deletes values already written to Hive
- MERGE: Streamlines UPDATEs, DELETEs, and change data capture operations by drawing on coexisting tables.
All four statements support auto-commit, which means that each statement is a separate transaction that is automatically committed after the SQL statement is executed.
The INSERT ... VALUES, UPDATE, and DELETE statements require the following property values in the hive-site.xml configuration file:
hive.enforce.bucketing = true
hive.exec.dynamic.partition.mode = nonstrict
INSERT
- Supports adding multiple values into table columns directly from SQL statements.
- A valid INSERT statement must provide values for each column in the table. However, users may assign values to columns for which they do not want to assign a value.
- In addition, the PARTITION clause must be included in the DML.
Syntax
INSERT INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] VALUES values_row [, values_row...]
Examples
INSERT INTO TABLE D_customer_new VALUES
(8, 'Helcur', 's3'),
(1, 'Abbel', 's3'),
(2, 'Bernard', 's2'),
(9, 'Jackeline', 's9')
Examples with partitions
INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23') VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null)
INSERT INTO TABLE pageviews PARTITION (datestamp) VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');
UPDATE
- Depending on the condition specified in the optional WHERE clause, an UPDATE statement may affect every row in a table
- You must have both the SELECT and UPDATE privileges to use this statement.
Syntax
UPDATE tablename SET column = value [, column = value ...] [WHERE expression];
- Limitations of the UPDATE statement:
- The expression in the WHERE clause must be an expression supported by a Hive SELECT clause
- Partition and bucket columns cannot be updated
- Query vectorization is automatically disabled for UPDATE statements. However, updated tables can still be queried using vectorization.
- Subqueries are not allowed on the right side of the SET statement.
Example
UPDATE D_customer_new SET segment = 's7' WHERE segment = 's3';
DELETE
- Used to delete data already written to Apache Hive
Syntax
DELETE FROM tablename [WHERE expression];
- Limitations of the DELETE statement:
- Query vectorization is automatically disabled for the DELETE operation. However, tables with deleted data can still be queried using vectorization.
Example
DELETE FROM D_customer_new WHERE segment = 's3';
MERGE
- Use the MERGE statement to efficiently perform record-level
INSERT
,UPDATE
, andDELETE
operations within Hive tables.
Example
MERGE INTO D_customer
using ( select * from D_customer_new) updt
on updt.id = D_customer.id
when matched then update set name = updt.name, segment = sub.segment
when not matched then insert values (updt.id, updt.name, updt.segment);
Source