Transactions in Hive [ACID] - ignacio-alorre/Hive GitHub Wiki

  • From Hive 0.13, it enables SQL atomicity of operations at the row level rather than at the table/partition level
  • This allows a Hive client to read from a partition at the same time that another Hive client is adding rows to the same partition
  • It also provides a mechanism for streaming clients to rapidly update Hive tables and partitions
  • Each Hive Transaction has an identifier. Multiple transactions are grouped into a single transaction batch
  • Client requests a set of transaction IDs after connecting to Hive and subsequently uses these transaction IDs, one at a time
  • Clients write one or more records for each transaction and either commit or abort a transaction before moving to the next transaction

ACID is an acronym for four required traits of database transactions: atomicity, consistency, isolation and durability

  • Atomicity: An operation either succeeds completely or fails. It does not leave partial data
  • Consistency: Once an application performs an operation, the results of that operation are visible to the application in every subsequent operation
  • Isolation: Operations by one user do not cause unexpected side effects for other users
  • Durability: Once an operation is complete. it is preserved in case of machine or system failure

Source