2023.05.10 Community Meeting - OCFL/spec GitHub Wiki

Call-in Details

Zoom Link: https://emory.zoom.us/j/7074635164?pwd=SExsZ1NwYjVlNy9ZWHJHZ09BYXVxQT09

Attendees

  1. Jürgen Enge
  2. Jared Whiklo ⭐
  3. Dan Field
  4. Arran Griffith
  5. Andreas Nef
  6. Neil Jeffries
  7. Seth Erickson
  8. Stefano Cossu
  9. Tom Wrobel
  10. James Alexander
  11. Jessica Colati

Agenda

  1. Welcome
    1. Volunteer Notetaker
    2. Community updates (introductions, updates, implementations, plans, etc)
  2. Demonstration/presentation of (complex) OCFL extensions (Jürgen Enge)
  3. Next community meeting: Wednesday June 14th 8pm ET / 5pm PT | Thursday June 15th 12am GMT / 10am AEST (Convert to your time zone)

Notes

  1. Presentation - https://github.com/je4/gocfl
    • OCFL is generated in a zip file without compression with a sidecar sha checksum file. Reason for no compression is around local use case of holding compressed Tiffs from books together.
    • Trying to work with the structure as opposed just putting things into OCFL and using it's validation
    • Extension runs a migration of the files (using sigfried and apache tika) which adds a bunch of metadata, adds an AES file and the encrypted key.
    • Second version has reordered and added a data directory which contains the original content directory and one for metadata.
    • Has multiple extensions all managed by an "extension manager" (which is a new extension)
    • Inventory file structure is quite different
      • has a Files collection with file checksum as key
      • has an extension key with the extensions used
      • has filesystem metadata
      • mimetype and pronom ids are also stored   * also has some technical metadata
    • Extensions could have all these types
      • Management extensions which do sorting & exclusion based on a hook
      • Structural extensions for controlling the storage layout, manifest path & external path.
      • Metadata extensions, filesystem to control the re-ordering of data & content paths, technical to control keeping empty directories and semantical which has the indexer which runs the metadata extraction methods to run.
      • Content extension type which generates new content, the migration extension creates new files as it runs to create digests.
      • Validation extension as to perform some validation of the content.
      • Impossible extension which is container extension which would allow creating a zipfile to contain storage and the Encryption
    • Much of this work could be done prior to storing in OCFL, but Jürgen's system allows it to happen consistently for all objects as they are placed in OCFL.
  2. e-ARK project looking at a standard AIP, Neil is working on aligning this work with OCFL.
  3. There is some alignment of Premis with Jürgen's work
  4. Thinking about the difference between working on top of the specification versus working within the specification, what are the costs and benefits of each.
  5. Wondering of the concerns of making the specification more complex and therefore difficult to implement and might making it more difficult to implement.
    • For 2.0 the editors are concerned with making a small number of new features but keeping the core clean.
  6. Wondering about how people are handling the multitude of files generated in a migration of a Fedora 3. Using XFS to avoid inode limits. Looking for any recommedations or suggestions.
    • Tom is having a slightly related issue due to nested directories, rsync is seeing a problem. Fedora 6 has added 3 layers from storage root to object root. Harvard are having a discussion and have decided to not touch the files and use a middle layer to handle actions. The mid-term management layer only handles the latest versions. The 2.0 spec considerations are to containerization where you have a container per object and only keep the latest uncontainerized.
    • Perhaps Dan's problems will go away once the migration is complete. Neil suggests using a separate assembly layer before moving your objects into OCFL.
    • Tom noticed that when using an OCFL aware tool he doesn't see as much problem, but when using normal file handling tools he has encoutered issues there. Neil suggests you should not delete a OCFL repository 😉 but if you must then format the volume it is faster.
  7. Andreas is wondering about the roadmap for 2.0, Neil directed to the issues open on the OCFL repository which need to be worked through. Andreas is wondering about WORM compliance as the top-level inventory file is a convenience, but one that officially causing a problem.

Recording

New Action Items

  • Jürgen to provide his slides or (even better) record his own presentation and send a copy to Neil to disseminate.