Home - mrmiguez/citrus GitHub Wiki

citrus wiki

Overview

Collective Information Transformation and Reconciliation Utility Service (citrus) is a metadata aggregation and transformation tool for DPLA hubs. The program takes metadata collected by repox and transforms it DPLA MAPv4 compatible JSON-LD for ingest into the Digital Public Library of America.

fl-dpla tech stream

citrus is written in Python 3. It is designed to be run on the same machine hosting repox.

Goals and future development

The purpose and mission of citrus is to provide a tool that allows for flexible and automatic metadata transformation for metadata aggregators.

Files & purpose

  • citrus.py - defines the transformation scenarios
  • citrus-run.py - runs the citrus utility
  • citrus_config.py - configuration settings
  • assets/ - additional services for thumbnails and reconciliation functions

Configuring citrus

The transformation and plugin services to be run are defined in the citrus_config.py file.

CONFIG_DICT is the list of expected repox exports and requisite mappings Keys are shortened forms of repox export directories.
They will be expanded by citrus-run using glob.glob(key*), so full names aren't required. They should be relatively descriptive to avoid collision with other sets.

Values are tuples storing various run settings.

  1. metadata prefix--Currently only 'dc', 'qdc', and 'mods' are supported.
  2. dictionary of thumbnail service values
  3. aggregation.dataProvider
  4. aggregation.intermediateProvider

REPOX_EXPORT_DIR is the path to the exported metadata. A default path is assigned when a repox data set is created (typically /repox/export). This path can be changed when data sets are exported manually.

OUTPUT_DIR is the directory where the JSON-LD will be written.

PRETTY_PRINT Pretty print resulting JSON-LD? Setting this to False can reduce the size of the resulting document by up to a third. True is preferred only for non-production tasks (debugging and testing).

PROVIDER i.e. the name of the DPLA hub.

VERBOSE Setting this to True will print to the terminal window the OAI-ID of the record currently being processed.

citrus transformation methods

Error logging

Errors that may be encountered are logged in an error_DATE_.log file in the run directory. Missing any of the DPLA required elements:

  • Title
  • Rights
  • Identifier referencing the object in context

will cause the record to be skipped and not included in the final JSON-LD document. Other errors may be logged that do not pass the record out of the transformation.

Details of the errors in the log file include the full OAI-PMH identifier, so errors can be tracked and corrected.

Plug-in services