Importing Data - AtlasOfLivingAustralia/profile-hub GitHub Wiki

Questions to be asked of the customer:

  1. What is the name or your collection/organisation (this is used to create the Collectory Data Resource)?
  2. What is the ALA user name of the Administrator(s) for the collection?
  3. What NSL name and nomenclature matching strategy do you wish to use (see the Name Matching page)?
  4. Do you wish to bulk import images?
    1. If yes, then the customer will need to provide metadata for each image:
      1. At a minimum, the customer needs to provide the URL to download the image (upload of image files is not currently supported) and the image title.
      2. Creator (Photographer or illustrator), Rights Statement, Rights Holder and Licence (Creative Commons variant) should also be provided, but are not mandatory.

Creating a new collection

This section lists the steps required to create a completely new collection and import data. You will need to be an ALA Administrator to create a new collection.

  1. Create a Data Resource in the ALA Collections admin UI for the organisation
  2. Create a new collection in Profiles Hub, selecting the data resource you just created
    1. Add the administrator(s) via the Access Control panel in the collection administration screen.
  3. Create a script that parses your existing data set and produces the JSON document to be sent to the import profile web service
  4. Generate a CSV file containing a mapping between Scientific Name and URL for each image in your dataset
  5. Execute the script against the profile service's web service
  6. Upload the image file to the ALA Collections admin interface, then trigger the ingest process.

API

The Profile Service provides an API for importing profiles into the system.

The main web service is [host]/import/profile, which takes a JSON request (as a POST) with the following structure:

{
    "opusId": "",
    "profiles":[{
        "scientificName": "",
        "nameAuthor": "",
        "fullName": "",
        "enableNSLMatching": "",
        "nslNameIdentifier": "",
        "nslNomenclatureIdentifier": "",
        "nslNomenclatureMatchStrategy": "",
        "nslNomenclatureMatchData": [""],
        "links":[{
            "creators": [""],
            "edition": "",
            "title": "",
            "publisherName": "",
            "fullTitle":"",
            "description": "",
            "url":"",
            "doi":""
            }],
        "bhl":[{
            "creators": [""],
            "edition": "",
            "title": "",
            "publisherName": "",
            "fullTitle":"",
            "description": "",
            "url":"",
            "doi":""
            }],
        "attributes": [{
            "creators": [""],
            "editors": [""],
            "title": "",
            "text": ""
            }]
    }]
}

NOTES:

  • The import service will not allow two profiles with the same scientific names.

The import process is asynchronous. The response from this service will be small JSON document with an ID and a status. e.g.

{
    "status": "IN_PROGRESS",
    "id": "9fb23e4a-e8ef-4e1e-8a28-6187d16edf9d",
    "report": ""
}

The ID can be used to poll for the import report using the service [host]/import/[ID]/report (e.g. import/9fb23e4a-e8ef-4e1e-8a28-6187d16edf9d/report). Poll periodically until the status = COMPLETE.

The response from the import report service will be a JSON document mapping Scientific Name to a status, with any warnings or errors that may have occurred. For example:

{
    "status": "COMPLETE",
    "id": "9fb23e4a-e8ef-4e1e-8a28-6187d16edf9d",
    "report": [
        "scientificName": {
            "status": "success|warning|error",
            "warnings": [],
            "errors": []
        ...
    ]
}