Data visualization opportunities - orcasound/orcadata Wiki

Here we offer Orcasound data products that data scientists and bioacousticians may enjoy visualizing and analyzing. Acoustic bouts define a time series of valuable signals in the Orcasound audio data streams. Human and machine detections indicate when community scientists or automated algorithms hear signals of interest while listening to Orcasound live audio streams.

Acoustic bouts

These are periods of time when interesting signals were detected within the Orcasound live audio streams. Some were detected by human listeners via the Orcasound web app (version 2 launched May, 2020); others were detected by automated algorithms, like OrcaHello (deployed in Sep/Oct, 2020). Each acoustic bout is ultimately defined first through a combination of human and machine detections, often contextualized by observations by local sighting networks, with start and end times usually extended to include weak signals and background noise conditions (before and after the event) through manual inspection by bioacoustic experts.

The primary focus is on SRKW bouts, but our archive includes bouts of signals from Bigg's killer whales, humpback whales, and other soniferous species of the Salish Sea.

Human detections

15 May 2021 data clip (~2000 rows)

Version 2 of the live-listening web app offered an interactive feature to community scientists: a button to select whenever they heard anything interesting. Free text annotations were stored along with a datetime stamp (the time at which the tag was submitted). The datetime stamps in the database (and exported snapshots below) are stored in the UTC time zone. (Note that in the administrator UI of orcasite the timestamps are converted to and displayed in the local time zone -- e.g. Pacific Standard Time, or PST, for the Orcasound network which is based in Washington State, U.S.A.)

This is a ~9-month clip of the Heroku-hosted PostgreSQL database that holds these human detections. It was generated and analyzed in a preliminary fashion during a DemocracyLab hackathon associated with Western Governors' University. The students have provided some tips on ingesting and processing these data in the hackathon project Google doc.

Explanation of the fields:

  • id = unique identifier for the record within the Postgres database
  • playlist_timestamp = A Unix datetime stamp indicating when the annotate audio data stream began
  • player_offset = the number of seconds into the current stream when the annotation was made (likely time of label submission, rather than selection of the button)
  • source_ip = IP of annotator
  • feed_id = hydrophone location
  • inserted_at = ?
  • updated_at = ?
  • listener_count = # of simultaneous listeners at the time of the annotation
  • timestamp = datetime of the annotation (UTC time zone)
  • candidate_id = unique id for any temporal grouping of the detection(s)
  • description = free text label associated with the annotation event

Machine detections

OrcaHello live inference system

  • OrcaHello dashboard
    • summarizes raw and moderated 60-second candidates
    • offers lists of tags and comments on positive vs negative candidates, with links to audio and spectrograms

16 Sep 2021 OrcaHello detection table snapshot (~3500 candidates)

  • Raw JSON (10 MB, 3476 rows)
  • Acquired quasi-manually using Microsoft Azure Storage Explorer [Cosmo DB Accounts (deprecated)]
  • (CosmoDB = aifororcasmetadatastore; predictions --> metadata --> Documents; query "SELECT * FROM c")
  • 3457 total candidates: 2639 moderated; 818 unmoderated.
  • 2280 false positives (86%); 347 true positives (13%); 12 unknown (1%).

REST API: programmatic access to OrcaHello candidates

Un/moderated output from the real-time inference system (Azure-based CosmosDB database via Swagger)

Marine animal location

Data export from the real-time data collective, Acartia, that spans the range of the endangered Southern Resident Killer Whales (from northern California to northern British Columbia). Please take note of the Creative Commons license and attribution guidance within the Acartia community guidelines.

22 April 2022 data dump

  • 23,500 rows of (x,y,t…) from acoustic or visual observations of various marine species (SRKWs, Bigg’s KW, humpbacks)
  • .csv file (5.5 MB)
  • Google sheet

User data

Users of Orcasound software and content generate data that could be visualized and/or integrated with other data streams. The orcasound.net web content (a stand-alone Wordpress site as of 2022) is tracked with Google analytics, as is Orcaound's live-listening web app (deployed at live.orcasound.net). There are also user subscription and feedback forms that generate user data. Some of these data sources have been visualized by Adrian (in 2021-22) at the Orcasound User Data Dashboard.