Generic Digitization Workflow - uchicago-library/ldr_documentation GitHub Wiki

Generic Digitization Workflow

About this document

This document was written to give a Migrations Specialist (MS) an overview of our digitization workflow. The MS writes scripts to prepare images for ingestion into OCFL, and to produce EDM triples and IIIF Manifests. These scripts should be documented and maintained so that they are easy to re-run when changes come in.

Most of the following steps must take place one after the other (in series); these steps are numbered. Some steps may happen at the same time (in parallel); these steps use letters instead of numbers. For example, steps 3.A and 3.B may happen in parallel, but steps 3.A.1 and 3.A.2 must happen in series.

1. Project Setup

The Principal Investigator / Project Initiator (PI) or Project Manager (PM) creates a folder to store project documentation and creates a spreadsheet to serve as the project inventory. This inventory will help to manage the entire project and should include one row per digital object to begin. Because every step in this process depends on the inventory, it needs to be completed before any work begins. In cases where that is impossible, the inventory should be produced in batches that are each as large as possible.

The PI/PM should add the following column headers to the inventory spreadsheet:

  • preservation identifier
  • dcterms:identifier
  • dcterms:title
  • dcterms:creator
  • dcterms:date
  • dcterms:description
  • ARK
  • IIIF Image Server URL
  • IIIF Manifest

Using whatever metadata is available, the PI/PM fills out Dublin Core metadata fields that begin with “dcterms:”.

Preservation staff creates a preservation identifier (e.g., gms-0931) to track each digital object. This identifier is used for directory and file naming during the scanning process. Preservation staff adds the preservation identifier to the appropriate column in the spreadsheet.

2. Mint ARK Identifiers

The PI/PM alerts the MS that the inventory is now complete. The MS mints ARK identifiers (e.g., ark:61001/b2jt29x3v493) for each digital object to our internal database by writing SQL to INSERT new rows for each digital object. The MS adds the ARK to the ARK column of the spreadsheet inventory.

3.A.1 Scan, Quality Control, and File Prep

Preservation staff create digital master files for each object and performs quality control. Files may be corrected or re-scanned if necessary. After quality control, Preservation deposits digital masterfiles and brief metadata in pickup area, and alerts the MS that new objects are available for pickup.

3.A.2 Well-formedness Check, Validation, OCFL Ingestion

The MS writes scripts to verify that files have been named according to specs, that they are well-formed and valid, and that all files produced conform to content specifications. These scripts should produce error logs that can be sent to Preservation staff to guide re-work. Once files have been verified, MS ingests these files into OCFL. Because the image server is set to pull images directly from OCFL, at this point image files will be available via the IIIF image server. The MS adds IIIF links to each image to the project spreadsheet, e.g., https://iiif-server.lib.uchicago.edu/ark:61001/b29b1tv7xn73/00000001/.

3.B Cataloging

PI/PM alerts Cataloging staff that objects are available via IIIF. Cataloging staff adds actionable ARK identifiers (e.g., https://n2t.net/ark:61001/b2jt29x3v493) to the MARC record. Cataloging staff uploads full metadata (e.g., [proxy].mrc, [proxy].mrc.xml) to the project folder.

4.A Add Full Metadata to the LDR

MS takes full metadata for each digital object from the project folder and adds it to each OCFL object.

4.B Produce EDM Triples

MS writes scripts to create EDM triples from full metadata records according to mapping supplied by the DLDC. These scripts should be maintained so that EDM triples can be rebuilt. The MS adds EDM triples to OCFL, and adds EDM triples to Node.

5. Create IIIF Manifests

Based on full digital object packet in OCFL, including master image files and finalized metadata records, MS writes scripts to produce IIIF manifests. Metadata included in IIIF Manifests should match a spec supplied by the DLDC. Manifests should be uploaded to the manifest server. Once that has been done, they will be available at URLs like https://iiif-collection.lib.uchicago.edu/object/ark:/61001/b2d05353d58m.json.

Abbreviations used in this document

  • ARK: Archival Resource Key
  • DC: Dublin Core
  • DLDC: Digital Library Development Center
  • EDM: Europeana Data Model
  • IIIF: International Image Interoperability Framework
  • OCFL: Oxford Common File Layout
  • MS: Migration Specialist
  • PI: Principal Investigator / Project Initiator
  • PM: Project Manager
⚠️ **GitHub.com Fallback** ⚠️