Ingest - smith-special-collections/sc-documentation GitHub Wiki
Page Table of Contents
Batch ingest with zip file
Zip ingest is limited to zip files that are less than 4096 MB. If performing a larger batch ingest, the FTP method is recommended.
-
Prepare the IMI CSV as is outlined here.
-
Prepare your zip file:
- save all files in a single folder
- select all files
- Right-click and select Compress (Mac); or Send to > Compressed (zipped) folder (in Windows)
(Note: Do not make the zip archive by selecting and compressing the parent folder. The files must be at the top level of the archive.)
- Once metadata and zip file are prepared, follow steps below under Setup batch ingest in Compass
Batch ingest with FTP
This is the preferred method for ingest. It allows for a much larger set of digital objects to be ingested and is a much simpler method. However, it requires access to the Hampshire VPN with a Hampshire account. You should also have your workstation's IP address registered so that you can upload files to the FTP server without connecting to the Hampshire VPN (see below). Contact Tristan Chambers to obtain an account and to have your IP address registered, and see Setup instructions
-
Prepare the IMI CSV as outlined here.
-
Prepare files to upload. The files must be unnested (i.e. no subfolders or zipped files).
-
Open FTP client and connect to server
Note: If you have an account but are working from a computer that has not been registered, then you need to first connect to the Hampshire VPN before you can access the FTP server. And since you cannot be connected to the Hampshire VPN and Smith storage servers at the same time, you will have to copy the files to a local directory or portable hard drive before uploading.
How to connect to the Hampshire VPN:
- In the bottom right of your screen, click on the Network icon.
- Click Hampshire DUO VPN
- Click Connect
- Login with your username and password
- You will receive a message on your mobile device via the DUO app, which you must accept to continue.
-
Once connected to the FTP server, navigate to Smith's staging folder: /mnt/ingest/smith (same pathway in DEV, STAGE, and PROD)
-
If you haven’t already, create a subfolder for your project. File structure must be flat, with no subfolders.
-
Upload your files to the project folder on the remote server.
Setup batch ingest in Compass:
-
Once you have uploaded your files to the FTP server, or prepared your zip folder for upload, login to Compass STAGE or PROD
-
Scroll to the footer menu, and select "Multi Import Objects."
-
Choose Data type: "Spreadsheet file to be uploaded" and click "Next." Click "Choose file" and browse to find your metadata CSV file, then click "Upload." Once uploaded, select "Preprocess."
-
Once the metadata has been preprocessed, you will be taken to the "Import settings" page to define the metadata template, datastream and object properties mappings.
-
From the tabbed menu, select the "Templating" tab. At the bottom of the page, under "Load Existing Template," select Twig_Temp_2018_08_22.
-
Then, from the tabbed menu, select "CMODEL Mapping." Via this tab, you will be configuring the content models for the objects, as defined in the metadata CSV.
a. From the drop-down menu, select cmodel, then click Check CMODELS.
b. For each available field, under each content model section, select the following entries for each datastream:
Islandora Internet Archive Book Content Model or Compound Content Model
Datastream | Drop-down selection | Notes |
---|---|---|
DC | Select "default XSLT". | |
MODS | Select "Twig_Temp_2018_08_22". | |
TN | Select "build using islandora generated derivative". | |
Select "Don't Create" | This doesn't work for BookCM, only for PDF ContentModel |
Islandora Page Content Model or Islandora Newspaper Page Content Model
Datastream | Drop-down selection | Notes |
---|---|---|
DC | Select "default XSLT". | |
OBJ | Select "obj_file". | |
JP2 | Select "build using derivative from OBJ". | |
TN | Select "build using derivative from OBJ". | |
JPG | Select "build using derivative from OBJ". | |
OCR | Select "Don't Create" | This option doesn't work in multi-importer |
HOCR | Select "Don't Create" | This option doesn't work in multi-importer |
Islandora Large Image Content Model
Datastream | Drop-down selection | Notes |
---|---|---|
DC | Select "default XSLT". | |
MODS | Select "Twig_Temp_2018_08_22". | |
OBJ | Select "obj_file". | |
JP | Select "build using derivative from OBJ". | |
TN | Select "build using derivative from OBJ". | |
JPG2 | Select "build using derivative from OBJ". |
Islandora PDF Content Model
Datastream | Drop-down selection | Notes |
---|---|---|
DC | Select "default XSLT" | |
MODS | Select "Twig_Temp_2018_08_22" | |
OBJ | Select "obj_file" | |
PDFA | Select "obj_file" | |
FULL_TEXT | Select "build using derivative from OBJ" | |
TN | Select "build using derivative from OBJ" | |
PREVIEW | Select "build using derivative from OBJ" |
- Once you have designated how to build each datastream, select the "Object Properties Mapping" tab. For each field, select the following:
Object Properties Mapping
Field | Drop-down selection | Notes |
---|---|---|
Object PID | Select "parent"; and check the box to let Islandora build a PID for you. | |
Parent object | Select "parent". | |
Object label | Select "title". | |
Sequence and ordering | Select "sequence". | Specifies which field is to be used as an index of sequence order for multi-child objects. |
Remote DS sources | For zip file upload select "zip"; for FTP ingests, select "local" | Specifies the location from which the objects are being made accessible for ingest. |
- Click "Ingest."
For zip file ingests:
- On the next page, browse to your zip file on your computer, click "Upload," then, once uploaded, click "Ingest."
- The objects and metadata have now been placed into a queue for ingest.
- At this point the workflow depends on which type of ingest you're doing:
-
For a larger ingest (~50 objects):
- Let the Compass Ingest Coordinator (IC) know that the batch is ready for ingest (currently this is Rachel Leach at MHC), by sending a direct message on Slack with the batch set number and urgency
- Fill out Compass ingest manifest (a.k.a. Islandora ingest queue)
- skip to Quality control (step 14 below)
-
For a smaller, front-end ingest (less than ~50 objects) follow the steps below:
-
Before ingesting:
- Confirm that derivative generation is enabled in the platform (PROD or STAGE)--check Slack, #compass-announce for current status. If disabled, ask IC to change
- Confirm that the content model you're using is enabled for the collection
- Send a message on Slack channel #compass-announce, e.g.: "Running a 50-object front-end ingest on PROD"
-
Start the ingest: select "Islandora Batch Sets" in the footer.
On the Islandora Batch Ingest Sets page, find the set that you just created. On the right, select "View Items in set."
On the Batch Queue page for the set, you should see a list of your items with the State: "Ready to ingest.
Click "Process Set" at top, then, on the next page, click "Start Batch Processing" at the bottom. This will begin the ingest and object record creation process.
(Wait.... depending on the size of your ingest and files, processing may take some time!)
-
Once you've completed the ingest, or received confirmation from the ingest coordinator that the ingest is completed, do Quality Control and troubleshooting on your objects.
-
After ingest and QC are completed, delete your batch set and remove files from the FTP server (if applicable):
- Deleting a batch set from Compass:
- Go to “Islandora Batch sets” (in footer)
- Locate the set and click “View Items in set”
- Click “Delete Set”
- Click “Confirm”
One-off ingest
This approach should be used when ingesting one or a small number of objects into Compass. This approach utilizes the metadata forms that are available in Compass. The forms available are out-of-the-box and have not yet been customized to Compass and the partner institutions. Guidelines for creating and working with XML forms may be found here: https://github.com/Islandora/islandora/wiki/Creating-and-Working-With-XML-Forms. Creation of new forms and updates to existing forms must be done in conversation with the Compass Metadata Committee.
This method is not recommended for archival collections, as metadata for archival components, including their component URI, should be exported from ArchivesSpace, then ingested into Compass.
-
Scroll to the footer menu, and select Islandora Repository.
-
Choose a top-level collection from the silver folder icons. At this level, the collections correspond with the partner institution.
-
Continue to drill down into the partner institution's folder to the specific collection folder, within which you would like to add an object.
-
Once in the folder, select the Manage tab, underneath the navigation breadcrumbs. This will take you to the backend management dashboard.
- Select Add an Object to this Collection.
- On the next page, select a content model to ingest from the drop-down list. Click Next.
- Note: If the content model you wish to associate with an object does not appear in the drop-down list, you must update the collections' collection policy.
-
If you are ingesting the PDF content model, the next page will ask you to Select a Form. Select "PDF MODS form" and click Next.
-
On the next page, unless you are importing MARCxml, skip this step, click Next:
-
In the metadata form, enter the Title (Title is required.) If using the PDF model, put the Digital Object identifier in the note field. If using the Book Object model, put it in the local identifier field. Click Next.
-
Choose the file, associated with the metadata, that you wish to upload. Once chosen, select Upload.
-
For PDF ingest:
- Select default Image settings (24-bit and 300 DPI).
- For PDFs with OCR, select "Extract text from PDF" (it is not recommended to "Perform OCR during ingest" as it will slow down processing considerably; it is better to run OCR on the PDFs prior to ingesting)
- Select language (default: English)
There are no settings to select for image (TIFF) files ingests.
-
Click Ingest.
-
When ingest is complete, you will be taken to the object's record.
- Perform Quality Control on your objects.