6. Digital Preservation - digitalutsc/islandora_lite_docs GitHub Wiki

Introduction

We are extending and utilizing the excellent work on preservation in Islandora 2.0. We are not using Fedora, but are checking checksums in Drupal and serializing content into Bags that comply with the bagit standard and reflect an OCFL file structure, to be stored and monitored elsewhere.

Persistent Identifiers

In addition to NIDs (Node IDs) and Taxonomy Term IDs, each entity in Drupal has a UUID that is assigned by the system. We are adopting the principle of preferring the UUID for basic system functions and things like serialization of content. We believe that this UUID can then be used to map other IDs of importance to us, such as ARK. We are also looking at the pre-existing Drupal modules that allow for aliasing node paths using system assigned UUID, as well as complimentary metadata we map to the UUID, such as ARK. Modules/features we feel might be of use include:

Technical Metadata - FITS

Any media entity in Drupal nodes can have multiple files. For any file in a Drupal media entity, we can present FITS-derived technical information in JSON format.

On each file entity there is one JSON field, and default text fields extracted from the JSON using Jmespath. A site administrator can define new fields for extraction/indexing from the source JSON field, using Jmespath. This will be described in the help text.

We currently assume that users have JSON content, created using the built-in PHP function for XML to JSON from existing FITS.xml files, and that JSON will be imported using Islandora Workbench. This migration case is our priority.

Checksum Checking

We are using the Islandora RipRap application against our stored FITS generated checksums to determine fixity of files. We have done some work to extend RipRap to support this work, and to generate exportable reports for checksum checking.

Auditing Object History

By turning on Drupal's versioning functions and forcing version creation when objects are touched, we can maintain a record of revisions to both content and media with function for reverting.

OCFL Export

We have extended the Islandora Bagit module to support packaging objects in Islandora as OCFL objects conforming to the bagit specification, including complete version histories of the Drupal contentOCFL specification

New Plugins for Islandora Bagit

AddMediaJson_IslandoraLite - gets the json representation of each media related to the node AddMediaJsonLD_IslandoraLite - gets the jsonld representation of each media related to the node AddFile_IslandoraLite - gets the actual file AddFileJson_IslandoraLite - gets the json representation of each file associated with media AddFileJsonLD_IslandoraLite - gets the jsonld representation of each file associated with media

Bags are named using a combination of a user-defined namespace (usually corresponding to the site from which the bag was derived) the UUID for the node, and the Node ID. This pattern of readable_systemID_UUID is used as the convention for naming throughout the directories of the bag.

OCFL objects are validated using the python validator and then stored in a user-defined location that includes a copy of the OCFL specification. Thus, exported bags contain all the data and metadata from Islandora_Lite and supports the rebuilding of the repository from the data represented in the directory.