DB best practices - sul-dlss/preservation_catalog GitHub Wiki

Basic database advice / pres cat specific guidelines

Preservation Catalog uses PostgreSQL to store metadata about the last known state of all known copies of preserved SDR objects. It relies heavily on DB-level constraints for keeping data consistent.

The Rails API docs: http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html

  • If possible, avoid declaring a transaction explicitly, e.g. by updating a parent object and its children, declaring autosave: true on the parent for the child relationships, and calling save or save! on the parent object. ActiveRecord will implicitly perform this multi-level save in a transaction.
  • Please use ApplicationRecord.transaction for clarity and consistency.
    • Functionally, it doesn't matter whether .transaction is called on a specific ActiveRecord class or object instance, because the transaction applies to the database connection, and all updates in a given thread of operation will be going over the same database connection (it is possible to configure ActiveRecord to do otherwise, but like most applications, we don't). To reduce confusion, it seems best to just always invoke it via the super-class, so that it's clear that the transaction applies to all object types being grouped under it.
  • If two or more things should fail or succeed together atomically, they should be wrapped in a transaction. E.g. if you're creating a PreservedObject so that there's a master record for the MoabRecord that you'd like to create, those two things should probably be grouped as a transaction so that if the creation of the MoabRecord fails, the creation of the PreservedObject gets rolled back, and we don't have a stray lying around.
  • Don't wrap more things than needed in a transaction. If multiple operations can succeed or fail independently, it's both semantically incorrect and needlessly inefficient to group them in a transaction.
    • Likewise, try not to do any unnecessary processing in the Application.transaction block. It's fine to wrap nested chains of method calls in a transaction, as it might be a pain to decompose your code such that the transaction block literally only contained ActiveRecord operations. At the same time, the longer a transaction is open, the higher the chances that two different updates will try to update the same thing, possibly causing one of the updates to fail. If it's easy to keep something unnecessary out of the transaction block, it'd be wise to do so.
  • In the unlikely event that you're tempted to pro-actively do row-locking, e.g. due to concern about multiple processes updating a shared resource (e.g. if multiple processes were crawling the same storage root and doing moab validation), the Postgres docs seems to advise against that. Instead, specifying transaction isolation level seems to be recommended as the more robust and performant approach.
  • You should likely be catching ActiveRecord exceptions outside of the transaction block, as you likely want to abort the transaction after the first ActiveRecord exception anyway. In other words, wrap transactions in exception handling, and not vice versa. See "Exception handling and rolling back": http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html
  • Use of ActiveRecord enums (http://edgeapi.rubyonrails.org/classes/ActiveRecord/Enum.html):