DB best practices - sul-dlss/preservation_catalog GitHub Wiki

Basic database advice / pres cat specific guidelines

Preservation Catalog uses PostgreSQL to store metadata about the last known state of all known copies of preserved SDR objects. It relies heavily on DB-level constraints for keeping data consistent.

The Rails API docs: http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html

If possible, avoid declaring a transaction explicitly, e.g. by updating a parent object and its children, declaring autosave: true on the parent for the child relationships, and calling save or save! on the parent object. ActiveRecord will implicitly perform this multi-level save in a transaction.
Please use ApplicationRecord.transaction for clarity and consistency.
- Functionally, it doesn't matter whether .transaction is called on a specific ActiveRecord class or object instance, because the transaction applies to the database connection, and all updates in a given thread of operation will be going over the same database connection (it is possible to configure ActiveRecord to do otherwise, but like most applications, we don't). To reduce confusion, it seems best to just always invoke it via the super-class, so that it's clear that the transaction applies to all object types being grouped under it.
If two or more things should fail or succeed together atomically, they should be wrapped in a transaction. E.g. if you're creating a PreservedObject so that there's a master record for the MoabRecord that you'd like to create, those two things should probably be grouped as a transaction so that if the creation of the MoabRecord fails, the creation of the PreservedObject gets rolled back, and we don't have a stray lying around.
Don't wrap more things than needed in a transaction. If multiple operations can succeed or fail independently, it's both semantically incorrect and needlessly inefficient to group them in a transaction.
- Likewise, try not to do any unnecessary processing in the Application.transaction block. It's fine to wrap nested chains of method calls in a transaction, as it might be a pain to decompose your code such that the transaction block literally only contained ActiveRecord operations. At the same time, the longer a transaction is open, the higher the chances that two different updates will try to update the same thing, possibly causing one of the updates to fail. If it's easy to keep something unnecessary out of the transaction block, it'd be wise to do so.
In the unlikely event that you're tempted to pro-actively do row-locking, e.g. due to concern about multiple processes updating a shared resource (e.g. if multiple processes were crawling the same storage root and doing moab validation), the Postgres docs seems to advise against that. Instead, specifying transaction isolation level seems to be recommended as the more robust and performant approach.
- Isolation level can be passed as a param to the transaction block, e.g. ApplicationRecord.transaction(isolation: :serializable) { ... }
- serializable is the strictest isolation level: https://www.postgresql.org/docs/current/static/transaction-iso.html
  - relevant Rails API doc: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html#method-i-transaction
  - more background and advice from the PG wiki: https://wiki.postgresql.org/wiki/Serializable
- If there is actually little contention in practice, the PG docs seem to indicate that specifying isolation level, even something as strict as serializable, should have little to no overhead (though it could increase chances of failed updates if there is actual resource contention, though code should already be prepared to handle failed DB updates gracefully). As such, there seems to be little risk to erring on the side of a strong isolation level when in doubt.
- E.g. you probably shouldn't do it this way: http://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html
You should likely be catching ActiveRecord exceptions outside of the transaction block, as you likely want to abort the transaction after the first ActiveRecord exception anyway. In other words, wrap transactions in exception handling, and not vice versa. See "Exception handling and rolling back": http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html
Use of ActiveRecord enums (http://edgeapi.rubyonrails.org/classes/ActiveRecord/Enum.html):
- In the enum definition, explicitly map each enum string to its underlying integer value, and leave a note admonishing future developers not to change enums that are already in use in production (or to think about the needed migration). This is to prevent unintentional re-mapping of existing enum values.
- Example: https://github.com/sul-dlss/preservation_catalog/blob/main/app/models/moab_record.rb