Fixing a stuck Moab - sul-dlss/preservation_catalog GitHub Wiki

Fixing a Stuck Moab

Problem: a new druid-version has been created by upstream processes, but prescat is not replicating it to archive endpoints.

Investigate the CompleteMoab of the stuck object

Note that the version is 3; we know it's version 4 on disk. Note also the status: "invalid_checksum" - for this druid, I think this was a holdover from the checksum mismatch bug.

[1] pry(main)> cm = CompleteMoab.by_druid('fm813sn1247')
=> [#<CompleteMoab:0x0000000005c70e48
  id: 405619,
  version: 3,
  preserved_object_id: 405619,
  moab_storage_root_id: 4,
  created_at: Sun, 21 Jan 2018 00:02:23 UTC +00:00,
  updated_at: Wed, 17 Oct 2018 09:22:35 UTC +00:00,
  last_moab_validation: Sun, 21 Jan 2018 00:02:23 UTC +00:00,
  last_checksum_validation: Tue, 07 Aug 2018 15:49:05 UTC +00:00,
  size: 84231379935,
  status: "invalid_checksum",
  last_version_audit: Mon, 06 Aug 2018 09:57:50 UTC +00:00,
  last_archive_audit: Wed, 17 Oct 2018 09:22:35 UTC +00:00>]

Validate checksums

Clear the invalid checksum error by validating again.

[6] pry(main)> CompleteMoab.by_druid('fm813sn1247').each(&:validate_checksums!)

Check cm again here to verify that the checksums did validate now. Output not shown.

Inventory the druid into the Catalog

We know it's really at version 4; force prescat to look again and update the record.

First, get the storage_location (we know it's on root 4 from the cm object, above).

6] pry(main)> storageroot = MoabStorageRoot.find_by( id: 4)
=> #<MoabStorageRoot:0x0000000006131bd0
 id: 4,
 name: "services-disk05",
 created_at: Thu, 18 Jan 2018 18:55:35 UTC +00:00,
 updated_at: Thu, 18 Jan 2018 18:55:35 UTC +00:00,
 storage_location: "/services-disk05/sdr2objects">

Now do a synchronous (could also do a perform_later async) catalog check. Note how that third argument in MoabToCatalogJob is constructed from storage_location and a druidtree path based on the druid.

[7] pry(main)> MoabToCatalogJob.perform_now( storageroot, "fm813sn1247", "/services-disk05/sdr2objects/fm/813/sn/1247/fm813sn1247")
Performing MoabToCatalogJob (Job ID: a0abea21-e495-465f-8e2a-536ca5b929fc) from Resque(m2c) with arguments: #<GlobalID:0x0000000006180140 @uri=#<URI::GID gid://preservation-catalog/MoabStorageRoot/4>>, "fm813sn1247", "/services-disk05/sdr2objects/fm/813/sn/1247/fm813sn1247"
check_existence fm813sn1247 called
Enqueued ChecksumValidationJob (Job ID: 335ded51-6121-4987-a1df-01b6dde63697) to Resque(checksum_validation) with arguments: #<GlobalID:0x000000000655bbe8 @uri=#<URI::GID gid://preservation-catalog/CompleteMoab/405619>>
check_existence(fm813sn1247, services-disk05) CompleteMoab status changed from unexpected_version_on_storage to validity_unknown
check_existence(fm813sn1247, services-disk05) actual version (4) greater than CompleteMoab db version (3)
Performed MoabToCatalogJob (Job ID: a0abea21-e495-465f-8e2a-536ca5b929fc) from Resque(m2c) in 653.11ms
=> [{:cm_status_changed=>"CompleteMoab status changed from unexpected_version_on_storage to validity_unknown"},
 {:actual_vers_gt_db_obj=>"actual version (4) greater than CompleteMoab db version (3)"}]

Note that M2C picked up the new version number.

Check your work

Note that the version is now correct and the status is 'ok' -- a new version automatically invokes zipmaker, so the object should have been automatically replicated to endpoints.

Verify correct version and status

[9] pry(main)> cm = CompleteMoab.by_druid('fm813sn1247')
=> #<CompleteMoab:0x000000000312f568
 id: 405619,
 version: 4,
 preserved_object_id: 405619,
 moab_storage_root_id: 4,
 created_at: Sun, 21 Jan 2018 00:02:23 UTC +00:00,
 updated_at: Tue, 06 Nov 2018 22:37:38 UTC +00:00,
 last_moab_validation: Tue, 06 Nov 2018 22:37:38 UTC +00:00,
 last_checksum_validation: Tue, 06 Nov 2018 22:37:38 UTC +00:00,
 size: 84231431008,
 status: "ok",
 last_version_audit: Tue, 06 Nov 2018 22:37:38 UTC +00:00,
 last_archive_audit: Wed, 17 Oct 2018 09:22:35 UTC +00:00>

Verify creation of appropriate zipped_moab_versions

Note that version 4 was created today, shortly before the last_version_audit timestamp from above.

[11] pry(main)> zmv = ZippedMoabVersion.by_druid("fm813sn1247")
=> [#<ZippedMoabVersion:0x00000000065160c0
  id: 8300158,
  version: 4,
  complete_moab_id: 405619,
  zip_endpoint_id: 1,
  created_at: Tue, 06 Nov 2018 22:27:54 UTC +00:00,
  updated_at: Tue, 06 Nov 2018 22:27:54 UTC +00:00>,
 #<ZippedMoabVersion:0x0000000006515f58
  id: 7851133,
  version: 3,
  complete_moab_id: 405619,
  zip_endpoint_id: 1,
  created_at: Sun, 02 Sep 2018 23:07:11 UTC +00:00,
  updated_at: Sun, 02 Sep 2018 23:07:11 UTC +00:00>,
 #<ZippedMoabVersion:0x0000000006515d78
  id: 7851131,
  version: 2,
  complete_moab_id: 405619,
  zip_endpoint_id: 1,
  created_at: Sun, 02 Sep 2018 23:07:11 UTC +00:00,
  updated_at: Sun, 02 Sep 2018 23:07:11 UTC +00:00>,
 #<ZippedMoabVersion:0x0000000006515c38
  id: 7851129,
  version: 1,
  complete_moab_id: 405619,
  zip_endpoint_id: 1,
  created_at: Sun, 02 Sep 2018 23:07:11 UTC +00:00,
  updated_at: Sun, 02 Sep 2018 23:07:11 UTC +00:00>]

Manually run ZipMaker (should not be necessary)

This particular druid had another issue - an existing druid-version-zip in our transfers area from a prior failed attempt to debug the problem. As a result, ZMV did not successfully replicate to the endpoints. Firing off zipmaker manually fixed this.

[12] pry(main)> ZipmakerJob.perform_later('fm813sn1247', 4)
  Enqueued ZipmakerJob (Job ID: 91594477-b182-4f47-8566-36c6b6a465fe) to Resque(zipmaker) with arguments: "fm813sn1247", 4
  => #<ZipmakerJob:0x00000000064075a8
   @arguments=["fm813sn1247", 4],
   @executions=0,
   @job_id="91594477-b182-4f47-8566-36c6b6a465fe",
   @priority=nil,
   @queue_name="zipmaker">

If zipmaker doesn't work, but a ZMV exists

Call replicate! on that specific existing version.

> zmv = ZippedMoabVersion.by_druid('hq932bt8082').find_by(version: 3)
=> #<ZippedMoabVersion:0x000000000461bf58 id: 6404, version: 3, last_existence_check: nil, complete_moab_id: 1401082, zip_endpoint_id: 2, created_at: Mon, 30 Jul 2018 16:59:51 UTC +00:00, updated_at: Mon, 30 Jul 2018 16:59:51 UTC +00:00, status: "unreplicated">
> zmv.replicate!
Enqueued ZipmakerJob (Job ID: 53eafb1e-e5fc-43b9-9ec0-feabb5e330a9) to Resque(zipmaker) with arguments: "hq932bt8082", 1
=> #<ZipmakerJob:0x0000000005df7aa0 @arguments=["hq932bt8082", 1], @executions=0, @job_id="53eafb1e-e5fc-43b9-9ec0-feabb5e330a9", @priority=nil, @queue_name="zipmaker">

AWS CLI then verified that all expected zips were present on the endpoint.

AWS CLI output:
fm/813/sn/1247/fm813sn1247.v0001.z01
fm/813/sn/1247/fm813sn1247.v0001.z02
fm/813/sn/1247/fm813sn1247.v0001.z03
fm/813/sn/1247/fm813sn1247.v0001.z04
fm/813/sn/1247/fm813sn1247.v0001.z05
fm/813/sn/1247/fm813sn1247.v0001.z06
fm/813/sn/1247/fm813sn1247.v0001.z07
fm/813/sn/1247/fm813sn1247.v0001.zip
fm/813/sn/1247/fm813sn1247.v0002.zip
fm/813/sn/1247/fm813sn1247.v0003.zip
fm/813/sn/1247/fm813sn1247.v0004.zip

PreservedObject and CompleteMoab Disagree

Here's an interesting one. PO for a druid says it's version 3. CM says it's version 5. Disk says it's version 6.

[13] pry(main)> po = PreservedObject.find_by(druid: 'kf921gd3855')
=> #<PreservedObject:0x000000000428e750
 id: 1415209,
 druid: "kf921gd3855",
 current_version: 3,
 created_at: Wed, 05 Sep 2018 02:39:49 UTC +00:00,
 updated_at: Tue, 18 Sep 2018 20:04:16 UTC +00:00,
 preservation_policy_id: 1>
[17] pry(main)> cm = CompleteMoab.by_druid('kf921gd3855')
=> [#<CompleteMoab:0x0000000006429ba8
  id: 1415217,
  version: 5,
  preserved_object_id: 1415209,
  moab_storage_root_id: 14,
  created_at: Wed, 05 Sep 2018 02:39:49 UTC +00:00,
  updated_at: Mon, 10 Dec 2018 07:33:02 UTC +00:00,
  last_moab_validation: Mon, 10 Dec 2018 07:33:02 UTC +00:00,
  last_checksum_validation: Mon, 10 Dec 2018 07:33:01 UTC +00:00,
  size: 108335,
  status: "ok",
  last_version_audit: Mon, 10 Dec 2018 07:33:02 UTC +00:00,
  last_archive_audit: Wed, 17 Oct 2018 14:43:46 UTC +00:00>]

The fix: tell PO it's version 5 (matching CM) then do M2C.

[19] pry(main)> po.current_version = 5 
=> 5
[20] pry(main)> po.save!
=> true
[21] pry(main)> MoabToCatalogJob.perform_now( storageroot, "kf921gd3855", "/services-disk15/sdr2objects/kf/921/gd/3855/kf921gd3855")
Performing MoabToCatalogJob (Job ID: 3e5e6732-cc8c-4c0f-8070-6f78b65933cc) from Resque(m2c) with arguments: #<GlobalID:0x0000000005b29be8 @uri=#<URI::GID gid://preservation-catalog/MoabStorageRoot/14>>, "kf921gd3855", "/services-disk15/sdr2objects/kf/921/gd/3855/kf921gd3855"
check_existence kf921gd3855 called
Enqueued ZipmakerJob (Job ID: 0bff41fa-d578-4ab9-8b7e-420b4124acb7) to Resque(zipmaker) with arguments: "kf921gd3855", 6
check_existence(kf921gd3855, services-disk15) actual version (6) greater than CompleteMoab db version (5)
Performed MoabToCatalogJob (Job ID: 3e5e6732-cc8c-4c0f-8070-6f78b65933cc) from Resque(m2c) in 674.09ms
=> [{:actual_vers_gt_db_obj=>"actual version (6) greater than CompleteMoab db version (5)"}]

Note that fired off ZipMaker.