Darwin Core Archive Event Core - AtlasOfLivingAustralia/biocollect GitHub Wiki

Event Core

BioCollect hosts a lot of data which is stored in an internal format. This wiki page discusses how it can be transformed to Darwin Core Archive event core format. The structure of data is as follows. A citizen science project has multiple surveys (internally called project activity) which has multiple site visits (internally called activity).

BioCollect will generate an archive for each project. And, an archive will have the following files included.

eml.xml - Project metadata is added here
meta.xml - Describes the content of csv files
Event.csv - All site visits (activities) are recorded here. Also, project activity is added here under Survey eventType.
MeasurementOrFact.csv - Contains measurements or fact from site visits.
Media.csv - Contains images from site visits
Occurrence.csv - Contains species occurrences.

How to configure BioCollect to generate DwCA?

BioCollect DwCA creator is not smart. Admin has to help BioCollect to generate DwCA correctly. This has to be done on form template of a survey. Each dataModel you like to add to DwCA has to be annotated with property dwcAttribute and its value mapped to DwC field. It reuses the existing attributions used for record creation. An annotated example of a dataModel is given below.

    {
      "dataType": "text",
      "name": “author”,
      "dwcAttribute": "recordedBy",
      "description": "The name of the person submitting this record",
      "validate": "required"
    }

Here, the value added to author field is assigned DwC field recordedBy.

Adding MeasurementOrFact

Similar to the above, adding a measurement or fact is by assigning "dwcAttribute": "measurementValue". An example is given below. As you can see, all associated values that goes with a measurement or fact is added to the dataModel.

    {
      "dataType": "number",
      "name": "spiValue",
      "dwcAttribute": "measurementValue",
      "measurementUnit": "SPI",
      "measurementUnitID": "http://qudt.org/vocab/quantitykind/SPI”,
      "measurementType”: “number”,
      "measurementTypeID": "http://qudt.org/vocab/quantitykind/Number”,
      "measurementAccuracy": "0.1",
      "description": "Calculated stream pollution index (SPI)"
    }

Special case

It is possible in BioCollect to have a table of measurement or fact values. The combination of header row and first column are used to derive the measurement type. Therefore, measurement type has to be programatically derived. Below is an example of one such case.

{
      "dataType": "list",
      "name": "dominantPlantSpeciesPreIntervention",
      "columns": [
        {
          "dataType": "species",
          "description": "The dominant plant species on the site at the time of commencement of the intervention works. [LIST UP TO 4 SPECIES PER STRATUM]",
          "name": "dominantSpeciesPreIntervention",
          "dwcAttribute": "scientificName"
        },
        {
          "dataType": "text",
          "description": "The vegetation stratum occupied by the species in it's mature state.",
          "name": "dominantSpeciesPreInterventionStratum",
          "constraints": [
            "Canopy",
            "Midstory",
            "Ground stratum"
          ],
          "dwcAttribute": "measurementValue",
          "measurementUnit": "unitless",
          "measurementType": "${dominantSpeciesPreIntervention.name} - Stratum"
        }
      ]
    }

 How to access DwCA?

DwCA file has to be accessed via ecodata - BioCollect’s back end system. The following APIs should be used.

1. Get auth token:

URL: https://ecodata-dev.ala.org.au/user/getKey
Header: 
userName : <ALA registered email>
password: <ALA password>

2. Get list of data resource available for harvesting

URL : https://ecodata-dev.ala.org.au/ws/record/listHarvestDataResource?max=10&offset=0&sort=asc
Header:
authKey : <Obtained from Step 1>
userName: <Username associated with the auth key>

It will generate response like below. Use value in archiveURL property to generate file.

{
    "total": 71,
    "list": [
        {
            "projectId": "17a7871e-15cd-43a3-b349-1161778b0aed",
            "name": "Superb Parrot Monitoring project",
            "dataResourceId": "dr5017",
            "dataProviderId": "dp3534",
            "status": "active",
            "alaHarvest": true,
            "archiveURL": "https://ecodata-dev.ala.org.au/ws/project/17a7871e-15cd-43a3-b349-1161778b0aed/archive"
        },
       ………
    ]
}

3. Get DwC archive

URL : https://ecodata-dev.ala.org.au/ws/project/17a7871e-15cd-43a3-b349-1161778b0aed/archive
Header: 
authKey : <Obtained from Step 1>
userName: <Username associated with the auth key>

Note: creating the archive can take several minutes depending on the number of activities in a project. In the next phase, I will improve the code to make it faster.