Data visualization opportunities - orcasound/orcadata GitHub Wiki

Here we offer Orcasound data products that data scientists and bioacousticians may enjoy visualizing and analyzing. Acoustic bouts define a time series of valuable signals in the Orcasound audio data streams. Human and machine detections indicate when community scientists or automated algorithms hear signals of interest while listening to Orcasound live audio streams. Environmental observations that contextualize acoustic detections include Pacific salmon monitoring data. User data comes in many forms and helps measure the conservation effectiveness of Orcasound apps, as well as guiding our user-centered design process.

Many of the fruits of our collaborations using Orcasound's open data are being shared in the orcaviz repo.

Acoustic bouts

These are periods of time when interesting signals were detected within the Orcasound live audio streams. Some were detected by human listeners via the Orcasound web app (version 2 launched May, 2020); others were detected by automated algorithms, like OrcaHello (deployed in Sep/Oct, 2020). Each acoustic bout is ultimately defined first through a combination of human and machine detections, often contextualized by observations by local sighting networks, with start and end times usually extended to include weak signals and background noise conditions (before and after the event) through manual inspection by bioacoustic experts.

The primary focus is on SRKW bouts, but our archive includes bouts of signals from Bigg's killer whales, humpback whales, and other soniferous species of the Salish Sea.

Human detections

15 May 2021 data clip (~2000 rows)

Version 2 of the live-listening web app offered an interactive feature to community scientists: a button to select whenever they heard anything interesting. Free text annotations were stored along with a datetime stamp (the time at which the tag was submitted). The datetime stamps in the database (and exported snapshots below) are stored in the UTC time zone. (Note that in the administrator UI of orcasite the timestamps are converted to and displayed in the local time zone -- e.g. Pacific Standard Time, or PST, for the Orcasound network which is based in Washington State, U.S.A.)

This is a ~9-month clip of the Heroku-hosted PostgreSQL database that holds these human detections. It was generated and analyzed in a preliminary fashion during a DemocracyLab hackathon associated with Western Governors' University. The students have provided some tips on ingesting and processing these data in the hackathon project Google doc.

Explanation of the fields:

  • id = unique identifier for the record within the Postgres database
  • playlist_timestamp = A Unix datetime stamp indicating when the annotate audio data stream began
  • player_offset = the number of seconds into the current stream when the annotation was made (likely time of label submission, rather than selection of the button)
  • source_ip = IP of annotator
  • feed_id = hydrophone location
  • inserted_at = ?
  • updated_at = ?
  • listener_count = # of simultaneous listeners at the time of the annotation
  • timestamp = datetime of the annotation (UTC time zone)
  • candidate_id = unique id for any temporal grouping of the detection(s)
  • description = free text label associated with the annotation event

Machine detections

OrcaHello live inference system

  • OrcaHello dashboard
    • summarizes raw and moderated 60-second candidates
    • offers lists of tags and comments on positive vs negative candidates, with links to audio and spectrograms

16 Sep 2021 OrcaHello detection table snapshot (~3500 candidates)

  • Raw JSON (10 MB, 3476 rows)
  • Acquired quasi-manually using Microsoft Azure Storage Explorer [Cosmo DB Accounts (deprecated)]
  • (CosmoDB = aifororcasmetadatastore; predictions --> metadata --> Documents; query "SELECT * FROM c")
  • 3457 total candidates: 2639 moderated; 818 unmoderated.
  • 2280 false positives (86%); 347 true positives (13%); 12 unknown (1%).

REST API: programmatic access to OrcaHello candidates

Un/moderated output from the real-time inference system (Azure-based CosmosDB database via Swagger)

Marine animal location

Orcasound bioacoustic events can be contextualized and analyzed with marine animal locations from the real-time data collective, Acartia. The collective spans the range of the endangered Southern Resident Killer Whales -- from northern California to northern British Columbia. When creating derivative works from the spatial data, please take note of the Creative Commons license and attribution guidance within the Acartia community guidelines.

Exceptional tracks

The best yet of aggregated tracks from sighting and listening networks:

2023

Thanks to Rachel, Marla, Serena, and Alisa at Orca Network for suggesting some of these!

  1. 12/23-28/23: J pod spends 5 days in Puget Sound and welcomes J60, a new male calf
  2. 12/14-29+/23: Humpback yearling spends 2+ weeks in Dalco Passage
  3. 9/17/23: Near strike of humpback CRC-20243 by Bremerton ferry!
  4. 9/16/23: Sat with T65As and T37/137 groups tracked in Puget Sound, including Thea Foss waterway incursion
  5. 9/11-13/23: J pod transits Haro Strait, then spends 2 days in Puget Sound in first fall visit
  6. ~7 days at end of Aug/first days of Sep: T65As deep in south Puget Sound, then Hood Canal; previously they were in San Juans (but not all Facebook threads were georeferenced...)
  7. T99s within Dyes Inlet for 5 days in 2023

2024

  1. 3/15: ~12-hour/80-km Bigg's killer whale track (T46s and T124Ds, including Thor and Strider; 07:55 acoustic detection at Sunset Bay to 19:30 visual observation in Dalco Passage)

Emerging challenges and opportunities

Other problems to tackle and ideas for visualizing Acartia data:

  • Focus on movement one pod -- e.g J pod's fall visits to Puget Sound) -- and compute speeds, tortuosity, other stats; study consistency of movement patterns and annual timing; build pod-specific forecast model for orcamap-react; etc...
  • Focus in on the Bigg's matrilines that have frequented Puget Sound proper the most in recent years, e.g. the T65As or T137s
  • Focus on one location, like Purdy Bridge and examine use by KW occurrence and movement there over years or decades. For example, in 2023 the T65As went under the bridge (a rare event historically).
  • Test how recent tracks compare with historic ones (e.g. by geo-referencing past events from Orca Network report archive](https://indigo-ukulele-jm29.squarespace.com/sightings-report-archive). One idea from Rachel was the April 2016 (pre-Acartia) event with 11 members of J pod going back and forth along the east side of Whidbey Island (for multiple days?)
  • Compare all vs trusted observer data for unusual species over the full range of Orca Network reports (2001-present)
  • Study how J+K+L pods comes together and/or separate? (visualization challenge of merging or bifurcating tracks)
  • Visualize "splits" and "rejoins" of individuals or sub-groups from their matriline and/or pod (e.g. Indy)

07 March 2024 data dump

  • 37,000 rows of (x,y,t…) from acoustic or visual observations of various marine species (SRKWs, Bigg’s KW, humpbacks)
  • .csv file (8.0 MB)
  • Google sheet

22 April 2022 data dump

  • 23,500 rows of (x,y,t…) from acoustic or visual observations of various marine species (SRKWs, Bigg’s KW, humpbacks)
  • .csv file (5.5 MB)
  • Google sheet

Pacific Salmon data

The Southern Resident killer whales are focused on Pacific salmon as their primary prey source, especially large Chinook salmon returning to the big rivers of British Columbia (the Fraser) and Washington (the Columbia). Information about the marine distribution of salmon or the timing of their return to these rivers can help contextualize the acoustic and visual observations of SRKWs across their range -- from northern California to Alaska.

Here are some sources of salmon data that could provide context for SRKW presence and movement observations:

User data

Users of Orcasound software and content generate data that could be visualized and/or integrated with other data streams. The orcasound.net web content (a stand-alone Wordpress site as of 2022) is tracked with Google analytics, as is Orcaound's live-listening web app (deployed at live.orcasound.net). There are also user subscription and feedback forms that generate user data. Some of these data sources have been visualized by Adrian (in 2021-22) at the Orcasound User Data Dashboard.