Example of Automatic Importing to Hydra at WVU - wvulibraries/mfcs GitHub Wiki
Storage
In the MFCS config, the variable nfsexport
allows you to define a shared export path. WVU does this with an NFS file share. Our hydra heads also mount this share, giving us shared storage between MFCS and our hydra head servers.
directory structure
The directory structure for each hydra head is created with the create-directory-structure.sh script. It relies on the HYDRA_PROJECT_NAME variable to create structure.
WVU uses the consistent naming convention of /home/HYDRA_PROJECT_NAME.lib.wvu.edu/HYDRA_PROJECT_NAME
as the rails path for each head. We also have each head in a separate container/server. This allows us to use an ENV variable to control where we find out shared resources.
The shared resources are created as:
mkdir -p /mnt/nfs-exports/mfcs-exports/"$HYDRA_PROJECT_NAME"/control/
mkdir -p /mnt/nfs-exports/mfcs-exports/"$HYDRA_PROJECT_NAME"/control/mfcs
mkdir -p /mnt/nfs-exports/mfcs-exports/"$HYDRA_PROJECT_NAME"/control/hydra/error
mkdir -p /mnt/nfs-exports/mfcs-exports/"$HYDRA_PROJECT_NAME"/control/hydra/finished
mkdir -p /mnt/nfs-exports/mfcs-exports/"$HYDRA_PROJECT_NAME"/control/hydra/in-progress
mkdir -p /mnt/nfs-exports/mfcs-exports/"$HYDRA_PROJECT_NAME"/control/hydra/staged
mkdir -p /mnt/nfs-exports/mfcs-exports/"$HYDRA_PROJECT_NAME"/export
export script
example export script
Hydra is system agnostic. As a result it is up to each institution to develop their own export scripts. MFCS ships with dublin core export scripts, but anything that does not use dublin core will need a custom script. An example export script for exporting the PEC collection to Hydra, using the auto-import scripts, is here:
The above scrip exports the metadata as json. If additional examples (such as exporting to XML) are needed, please contact us. We have many export scripts that export to XML, tab delimited, and CSV. As well as examples of saving digital items in gzipped files for convenient downloading.
example control file
The control file is a yaml file. When it is exported from MFCS the file name is a unix_time_stamp.yaml
. When it is moved to the inprocess directory, it is renamed to control_file.yaml
---
project_name: pec
time_stamp: 1479678699
# Export Type can be
# 1. update : metadata for all objects, but not all digital items
# 1. update_full : both metadata and digital items for all objects
# 1. full : Same as update_full, but we assume that there is no data loaded
# This would be for an intial load
# 1. partial : metadata update for some items and/or some digital objects
export_type: update
digital_items_count: 22
record_count: 33
# a yaml collection to contact when the import does not succeed
contact_emails:
- [email protected]
- [email protected]
project_name : must match the HYDRA_PROJECT_NAME env variable defined on the server. time_stamp : is the unix time when the exporting process occurred. We use this to make sure multiple exports get processed in the correct order. export_type : information, for debugging. digital_items_count : informations, for debugging. How many digital items were exported. This number is dependant on how the developer populated it in the export script. If there are multiple images per record, it could be a total count of all digital items OR it could be the count of records that have digital items. record_count : how many records were exported contact_emails : yaml list of emails that should be emailed when the import is complete, success or failure. the first one(s) are the global system administrators. After the global emails, it is the emails listed in the contacts section of the permissions on a form.
Importing
For automatic importing we run a series of scripts on head hydra head.
Note: the above script is what we use to import PEC's json into Hydra 7. If needed, we have examples of importing XML into Hydra 7 as well.
crontab
To get the automatic bit, everything is run via cron. This is out crontab on the PEC server:
PATH=/usr/local/sbin:/usr/sbin:/usr/local/bin:/usr/bin
*/1 * * * * ruby /opt/git_pull/hydra-import-scripts/src/crons/check-for-jobs.rb
*/5 * * * * cd /home/pec.lib.wvu.edu/pec; ruby /opt/git_pull/hydra-import-scripts/src/crons/process-jobs.rb
It is important to set the PATH env variable. If the Path variable isn't set for cron, rails will fail to run properly when called from the process-jobs.rb script.