Moab Audit Failures - sul-dlss/preservation_catalog GitHub Wiki

Audit jobs detect errors

Note that the jobs which regularly run audit tasks should not fail from the Sidekiq/ActiveJob queue management perspective when they detect errors. Rather, they should update the status on the relevant database field (e.g. complete_moabs.status, zip_parts.status) and send a Honeybadger alert. However, we sometimes fall short of this goal in the face of unforeseen failure modes, e.g. https://github.com/sul-dlss/preservation_catalog/issues/1696.

On Premises Storage Roots

If auditing alerts about possible corruption of a Moab that resides in a local storage root, e.g. if ChecksumValidator (or MoabToCatalog or CatalogToMoab) sends an alert or sets a status such as invalid_checksum, there is unfortunately no one size fits all approach to remediation. These occurrences are fortunately rare, but may require anything from undoing mistaken hand edits to a manifest file, to decommissioning the Moab and re-accessioning the content if the preserved content is deemed missing or corrupt (assuming the original content is still attainable).

For an example of this situation, see https://jirasul.stanford.edu/jira/browse/SDRO-391

⚠️ Note ⚠️ Any hand editing of preservation content, whether local Moabs or cloud replicated archives, requires prior approval of the repository manager. In general, it is bad practice to edit content that is already preserved. We want to avoid it where at all possible (ideally by not preserving bad copies or losing/corrupting preserved content). And when it is necessary, we want to keep clear documentation about what we did.

If Moab content must be edited, new copies should be replicated to the cloud once the Moab is in a good state.