Home - mrmiguez/citrus GitHub Wiki
citrus wiki
Overview
Collective Information Transformation and Reconciliation Utility Service (citrus) is a metadata aggregation and transformation tool for DPLA hubs. The program takes metadata collected by repox and transforms it DPLA MAPv4 compatible JSON-LD for ingest into the Digital Public Library of America.
citrus is written in Python 3. It is designed to be run on the same machine hosting repox.
Goals and future development
The purpose and mission of citrus is to provide a tool that allows for flexible and automatic metadata transformation for metadata aggregators.
Files & purpose
citrus.py
- defines the transformation scenarioscitrus-run.py
- runs the citrus utilitycitrus_config.py
- configuration settingsassets/
- additional services for thumbnails and reconciliation functions
Configuring citrus
The transformation and plugin services to be run are defined in the citrus_config.py file.
CONFIG_DICT
is the list of expected repox exports and requisite mappings
Keys are shortened forms of repox export directories.
They will be expanded by citrus-run using glob.glob(key*), so full names aren't required.
They should be relatively descriptive to avoid collision with other sets.
Values are tuples storing various run settings.
- metadata prefix--Currently only 'dc', 'qdc', and 'mods' are supported.
- dictionary of thumbnail service values
- aggregation.dataProvider
- aggregation.intermediateProvider
REPOX_EXPORT_DIR
is the path to the exported metadata. A default path is assigned when a repox data set is created (typically /repox/export). This path can be changed when data sets are exported manually.
OUTPUT_DIR
is the directory where the JSON-LD will be written.
PRETTY_PRINT
Pretty print resulting JSON-LD? Setting this to False
can reduce the size of the resulting document by up to a third. True
is preferred only for non-production tasks (debugging and testing).
PROVIDER
i.e. the name of the DPLA hub.
VERBOSE
Setting this to True
will print to the terminal window the OAI-ID of the record currently being processed.
citrus transformation methods
Error logging
Errors that may be encountered are logged in an error_DATE_.log file in the run directory. Missing any of the DPLA required elements:
- Title
- Rights
- Identifier referencing the object in context
will cause the record to be skipped and not included in the final JSON-LD document. Other errors may be logged that do not pass the record out of the transformation.
Details of the errors in the log file include the full OAI-PMH identifier, so errors can be tracked and corrected.