Harvest Workflows - QutEcoacoustics/baw-server GitHub Wiki

Our new harvester APIs have been designed to work in two ways: streaming mode or batch mode.

Streaming mode is intended for use by automated sensors that send audio back to the workbench as they collect it. It is the stricter form of the harvest modes and does not support corrective actions or free-form arrangements of files. Any files that fail validation are simply ignored.

Batch mode is intended for direct use by users. It supports corrective actions by users when validations fail, and allows for a free form upload of files.

Stages

A harvest model is a state machine. It has stages it can transition to. Some transitions are automatic, some must be manually activated.

status streaming description
:new_harvest true/false Initial state used only server side
:uploading true/false A software defined SFTP service is created and enabled for uploads
:scanning false SFTP connection is disabled, our workers scan for files
:metadata_extraction false SFTP connection is disabled, our workers validate files
:metadata_review false Users can review the state of the harvest.
Corrective actions can taken.
The harvest can be transitioned back to :uploading or :metadata_extraction.
:processing false The final ingestion phase
:complete false A final state. One a harvest enters this phase it cannot transition to any other phase. The SFTP connection is deleted and can never be reactivated.

State machine diagram

stateDiagram-v2

    state choice <<choice>>
    new_harvest -->  choice:⏭️
    
    choice --> uploading: if !streaming ⏭️
    choice --> uploading_streaming: if streaming ⏭️

    uploading --> scanning: 🔡
    scanning --> metadata_extraction: ⏭️
    metadata_extraction --> metadata_review: ⏭️
    metadata_review --> uploading: 🔡
    metadata_review --> metadata_extraction: 🔡
    metadata_review --> processing: 🔡
    processing --> complete: ⏭️

    uploading_streaming : uploading
    uploading_streaming --> complete: 🔡

    [*] --> complete: ❗abort

Loading
  • ⏭️ denotes an automatic transition
  • 🔡 denotes an manual transition that must be done by a client with a PATCH request that modifies status

Streaming Upload

  1. (new_harvest) Create a new harvest

    POST /projects/:projectId/harvests
    {
      "harvest": {
        "streaming": true
      }
    }
  2. (uploading) Upload files using SFTPGO login details from previous request

    • New directories cannot be made. Files must be uploaded into existing sub-directories.
    • During upload process, show harvest report to user
    • During upload process, show harvested audio files to user
  3. (uploading) Complete the harvest

    PATCH /projects/:projectId/harvests/:harvestId
    {
      "harvest": {
        "status": "complete"
      }
    }
  4. (complete) Show final statistics from the harvest report

Batch Upload

  1. (new_harvest) Create a new harvest
    POST /projects/:projectId/harvests
    {
      "harvest": {
        "streaming": false
      }
    }
  2. (uploading) Upload files using SFTPGO login details from previous request
  3. (uploading) Transition to metadata extraction stage
    PATCH /projects/:projectId/harvests/:harvestId
    {
      "harvest": {
        "status": "scanning"
      }
    }
  4. (scanning) Poll for updates. The harvest will transition to the next state automatically.
  5. (metadata_extraction) Poll for updates. The harvest will transition to the next state automatically.
  6. (metadata_review) Review changes. Three courses of action available:
    1. Change a directory mapping to add metadata to files
      • (metadataReview) Fix file mappings
        PATCH /projects/:projectId/harvests/:harvestId
        {
            "harvest": {
                "mappings": [
                    ...harvest.mappings,
                    {
                        "site_id": 1,
                        "path": "path/to/folder",
                        "utc_offset": "+10:00",
                        "recursive": true
                    }
                ]
            }
        }
      • Then transition back to the metadata extraction stage ⤴️
        PATCH /projects/:projectId/harvests/:harvestId
        {
            "harvest": {
                "status": "metadata_extraction"
            }
        }
    2. Files need to be changed or rearranged
      • Transition to back to the uploading stage ⤴️
        PATCH /projects/:projectId/harvests/:harvestId
        {
            "harvest": {
                "status": "uploading"
            }
        }
    3. Ready to advance
      • any yet to be fixed fixable errors will be processed as failures
      • any non-fixable errors will be processed as failures
      • any files that have no errors should be harvested successfully
      • Transition to processing stage
        PATCH /projects/:projectId/harvests/:harvestId
        {
            "harvest": {
                "status": "processing"
            }
        }
  7. (processing) Poll for updates. The harvest will transition to the next state automatically.
  8. (complete) Show final statistics

General API Queries

Poll Harvest Reports

GET /projects/:projectId/harvests/:harvestId

Filter Audio Recordings By Harvest

PATCH /audio_recordings/filter
{
  "filter": {
    "harvests.id": { "eq": harvest.id }
  }
}
⚠️ **GitHub.com Fallback** ⚠️