Audits (how to run as needed) - sul-dlss/preservation_catalog GitHub Wiki

Information about how to run audits

  • Moab to Catalog (M2C) existence/version check
  • Catalog to Moab (C2M) existence/version check
  • Checksum Validation (CV)
  • Catalog to Archive (C2A) / replication audit
  • Moab Validation (which is in the moab-versioning gem, but called by PreservationCatalog)

How do I know which one I want to run?

When in doubt, a fairly safe rule to use is:

  • If you're investigating an issue with the on prem copy of a Moab, you probably want to run checksum validation for the druid. See checksum validation for a single druid.
  • If you're investigating an issue with a cloud copy, you probably want to run Catalog to Archive (replication audit) for the druid.

Moab to Catalog (M2C) existence/version check

See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/Validations-for-Moabs for basic info about M2C validation.

Rake task for Single Root

  • You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
  • You do NOT need quotes for the root name
  • Checks will be run asynchronously via MoabToCatalogJob
RAILS_ENV=production bundle exec rake prescat:audit:m2c[root_name]

Via Rails Console

In console, first locate a MoabStorageRoot, then call m2c_check! to enqueue asynchronous executions via MoabToCatalogJob. Storage root information is available from settings.yml (shared_configs for deployments).

Single Root

msr = MoabStorageRoot.find_by!(storage_location: '/path/to/storage')
msr.m2c_check!

All Roots

MoabStorageRoot.find_each { |msr| msr.m2c_check! }

Single Druid

To M2C a single druid synchronously, in console:

CatalogUtils.check_existence_for_druid('jj925bx9565')

Druid List

For a predetermined list of druids, a convenience wrapper for the above command is check_existence_for_druid_list.

  • The parameter is the file path of a CSV file listing the druids.
    • The first column of the csv should contain druids, without prefix.
    • File should not contain headers.
CatalogUtils.check_existence_for_druid_list('/file/path/to/your/csv/druid_list.csv')

Note: it should not typically be necessary to serialize a list of druids to CSV. Just iterate over them and use the "Single Druid" approach.

Catalog to Moab (C2M) existence/version check

See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/Validations-for-Moabs for basic info about C2M validation.

Rake task for Single Root

  • You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
  • You do NOT need quotes for the root name.
  • You cannot provide a date threshold: it will perform the validation for every MoabRecord prescat has for the root.
  • Checks will be run asynchronously via CatalogToMoabJob
RAILS_ENV=production bundle exec rake prescat:audit:c2m[root_name]

Via Rails Console

In console, first locate a MoabStorageRoot, then call c2m_check! to enqueue asynchronous executions for the MoabRecords associated with that root via CatalogToMoabJob. Storage root information is available from settings.yml (shared_configs for deployments).

  • The (date/timestamp) argument is a threshold: it will run the check on all catalog entries which last had a version check BEFORE the argument. You can use string format like '2018-01-22 22:54:48 UTC' or ActiveRecord Date/Time expressions like 1.week.ago. The default is anything not checked since right now.

Single Root

This enqueues work for all the objects associated with the first MoabStorageRoot in the database, then the last:

MoabStorageRoot.first.c2m_check!
MoabStorageRoot.last.c2m_check!

This enqueues work from a given root not checked in the past 3 days.

msr = MoabStorageRoot.find_by!(storage_location: '/path/to/storage')
msr.c2m_check!(3.days.ago)

All Roots

This enqueues the checks from all roots similarly.

MoabStorageRoot.find_each { |msr| msr.c2m_check!(3.days.ago) }

Checksum Validation (CV)

See [Audits (basic info) wiki](http://github.com/sul-dlss/preservation_catalog/wiki/Validations-for-Moabs for basic info about CV validation.

Rake task for Single Root

  • You need to know the MoabStorageRoot name, available from settings.yml (shared_configs for deployments)
  • You do NOT need quotes for the root name.
  • It will perform checksum validation for every MoabRecord prescat has for the root, ignoring the "only older than fixity_ttl threshold" (which is currently 90 days)
  • Checks will be run asynchronously via ChecksumValidationJob
RAILS_ENV=production bundle exec rake prescat:audit:cv[root_name]

Via Rails Console

In console, first locate a MoabStorageRoot, then call validate_expired_checksums! to enqueue asynchronous executions for the MoabRecords associated with that root via ChecksumValidationJob. Storage root information is available from settings.yml (shared_configs for deployments).

Single Root

From console, this queues objects on the named storage root for asynchronous CV:

msr = MoabStorageRoot.find_by!(name: 'fixture_sr3')
msr.validate_expired_checksums!

All Roots

This is also asynchronous, for all roots:

MoabStorageRoot.find_each { |msr| msr.validate_expired_checksums! }

Single Druid

Synchronously, from Rails console (will take a long time for very large objects):

Audit::ChecksumValidatorUtils.validate_druid(druid)

If you're investigating a druid with a non-ok status, first, check in with the repository manager about the situation and get their sign-off to proceed with validation. Then, in the Rails console, invoke the method mentioned above.

If it passes, and the version matches the cocina in dor-services-app, things are probably good, but that's where repository manager opinion is helpful. Running checksum validation should also clear the error status if things are indeed ok, since that's the most thorough check we have for Moabs, and it tries to update recorded status to match reality.

If this detects errors, confer with the repository manager on next steps for investigation/remediation.

Druid List

  • Give the file path of the csv as the parameter. The first column of the csv should contain druids, without the prefix, and contain no headers.

Synchronously, from Rails console:

Audit::ChecksumValidatorUtils.validate_list_of_druids('/file/path/to/your/csv/druid_list.csv')

Druids with a particular status on a particular storage root

For example, if you wish to run CV on all the "validity_unknown" druids on storage root 15, from console:

Audit::ChecksumValidatorUtils.validate_status_root(:validity_unknown, 'services-disk15')

Valid status strings

Catalog to Archive (C2A) replication audit

For a single druid:

from rails console:

druid = 'vc647pg5260'
po = PreservedObject.find_by(druid: druid)
po # <-- use this to see the value for last_archive_audit -- is it recent enough?
MoabReplicationAuditJob.perform_now(po)

And now you wait for it to run through the individual PartReplicationAuditJobs for the individual zip parts. Watch Honeybadger for errors.

If errors are detected, see https://github.com/sul-dlss/preservation_catalog/wiki/Replication-errors#troubleshooting