Home - orcasound/orcadata GitHub Wiki

Welcome to the orcadata wiki!

This is a place to share and collaborate, especially regarding bioacoustic analysis of real-time and archived audio data related to the Orcasound open source project. Here you can learn more about Orcasound: machine learning resources related to orcas (training sets | test sets) and access to Orcasound data -- both archived training and testing data, and real-time audio streams. You may also be interested in the synopses of projects that leverage these open data at ai4orcas.net.

Data Resources

Most-recent progress (within the last year)

2023

  • Nov: Launch of v3 live listening app (540 category tags in first month, reviewable via reports list). 5th entrance of SRKWs into Puget Sound (15 new bouts recorded). Onboarding intern Lucy Day to build open SRKW whistle catalog & labeled data in Jan, 2024.
  • Oct: Beta testing of v3 Orcasound web app, including annotation categories: whale, vessel, & other. Extraction of example tracks for SRKWs, Bigg's KWs, and humpbacks from the Acartia data cooperative. Onboarding Ayushman Tripathi to orcamap project.
  • Sep: Microsoft hackathon doubles number of network nodes being monitored by OrcaHello; begins work on classifier for humpback non-song vocalization.
  • Aug: Repair of Sunset Bay hydrophone streaming system. HALLO team finalizing new online catalogue for SRKW calls. Lucy/Paul share Discord for Salish Sea bioacoustic discussions.
  • Jul: Testing audio stream performance from new node at the Marine Science and Technology (MaST) Center. Ze updates Orca-Eye-Aye repo README. Paul leads beta-launch of new human classification UI (whale, vessel, other icons) in Orcasound web app in preparation for September launch of v3.
  • Jun: Testing audio stream performance via YouTube relay from the Point Robinson lighthouse. Ze archives initial non-AIS vessel image train/test set in S3 Visual Sandbox bucket with 2641 vessel images across all classes. Prakruti reduces Azure burn rate and works on OrcaHello performance between nodes as well as surfacing extant models with weights via tools like HuggingFace. Skander adds ASH to orcasite admin to enable more advanced email notifications (e.g. of human detections and validations).
  • May: Testing audio stream performance from new north San Juan Channel node. SIMRES/SFU/HALLO students Lauren and Olivia use Orcasound data to study SRKW call compensation in ship noise. Completing transition to Amazon-sponsored S3 data buckets for Orcasound open data (Registry | Data Exchange on AWS Marketplace).
  • Apr: Val, Valentina, and Scott attend launch party in Nanaimo for the open source bioacoustic dashboard project led by Ben Hendricks and Jono Mendez with support from Orcasound and the BC Hydrophone Network. Ben mocks up visualization of OrcaHello confirmed detections in the
  • Mar: Final poster presentations of the UW Masters in Data Science students: Caleb Case, Mitch Haldeman, and Grant Savage. Their pioneering project computed noise metrics from Orcasound archived data was mentored by Valentina Staneva, Val Veirs, and Scott Veirs. Initial working working YOLO model for classifying non-AIS vessels developed by Beam Reach extern Ze Cui, mentored by Sam of Marine Monitor (M2), Val Veirs, and Scott Veirs.
  • Feb: John Ford contributes historic recordings of SRKWs from a 1997 event in Dyes Inlet to the Orcasound open data repository. The recording was acquired by the Center for Whale Research, a member of Orcasound, and is being shared with permission from their Research Director, Dr. Michael Weiss. Ben Hendricks and Jono Mendez request feedback on a prototyped bioacoustic dashboard developed in collaboration with the BC Hydrophone Network, coordinated by Janie Wray. Rachael Cheng, Val Veirs, and David Bain collaborate on orca call autoencoders and similarity algorithms. HALLO project formalizes 4-year Canadian government grant for AI-assisted SRKW movement monitoring and forecasting system, with Orcasound as a U.S. collaborator.
  • Jan: Scott initiates general orca-ai team in Orcasound organization on Github; UW MS data science student team finalizes project plan with Valentina to advance noise analysis with Orcasound open data archive (and Scott puts Port Townsend node into research mode to test FLAC vs HLS noise analysis); Valentina builds initial catalog of Orcasound archive; Orcasound labels more SRKW and humpback bouts, including S04 calls and whistles in first labeled bout from Point Robinson in southern Puget Sound. HALLO continues beta-testing new online catalogue with SRKW calls.

2022

  • Dec: Orcasound volunteer data scientist Zoe pioneers first semi-automated integration of SRKW movement data from Acartia and Chinook salmon counts from the Fraser and Columbia rivers.
  • Nov: Val delivers autoencoder talk for ONC/Meridian workshop; Ze spins up project to automate vessel image classification, a collaboration of Orcasound and Protected Seas using the M2 system deployed at Val's Orcasound Lab node; WDFW grants $25k for maintaining/expanding Orcasound nodes as real time oil spill response equipment.
  • Oct: Rachael Cheng in Berlin joins Orcasound standups to share NRKW AI progress and open source code, along with Alex Barnhill and Christian Bergler; Val starts work on rough SRKW click classifier (buzz, slow, fast); Valentina proposes project to UW Data Science Capstone on historical noise analysis; Ben Hendricks joins Orcasound standup with updates on open source bioacoustic dashboard project, a collaboration of Orcasound & the BC hydrophone network.
  • Sep: Orcasound's GSoC 2022 contributors make final reports; DemocracyLab hackathon (9/10) connects Acartia.io to orcamap; Microsoft hackathon (9/20-22, Github Project) refines OrcaHello UI, model training/deployment/monitoring, notifications, begins annotations to SRKW pod and call type, and establishes first Kaggle for orca calls
  • Aug: HALLO workshop on open data for SRKW movement forecast modeling (Aug 31 - Sep 01); Orcasound applies for AWS Open Data sponsorship (2 years); planning for Microsoft and DemocracyLab hackathons in Sept.
  • Jul: First blog posts from Orcasound GSoC 2022 contributors regarding: open source approaches to de-noising and source separation; ingestion of OOI hydrophone data from Oregon; refinement of the Orca Active Learning tool code & deployment.
  • Jun: Orcasound Google Summer of Code (GSoC) 2022 students begin coding
  • May: At DCLDE 2022 workshop, Beam Reach extern Emily Vierling shares her Haro Humpback open data & dictionary project, including a humpback non-song vocalization dictionary based on recordings from Haro Strait, WA, and an annotated training data set for 12 humpback signal types.
  • Apr: Earth Day hackathon organizes Orcasound open data visualization opportunities; OrcaHello Azure subscription extended until Oct, 2022.
  • Mar: OrcaHello Dashboard reaches 3,500 annotated 1-min candidates; Orcasound and HALLO project present at the DCLDE workshop in Hawaii
  • Feb: Orcasound accepted as 2022 GSoC host organization (3rd year)
  • Jan: OrcaHello tag cloud curated using standardized dictionary of labels.

2021

  • Dec: Orcasound presents at the Acoustical Society of America meeting in Seattle
  • Nov: SRKWs in Puget Sound, humpbacks in Haro! OrcaHello migrates to new Azure subscription; coordination with HALLO on ASA/DCLDE/SSEC talks; Orcasound extern Emily Vierling catalyzes humpback non-song vocalization label standardization.
  • Oct: Beluga in Puget Sound! OrcaHello team improves real-time inference system during annual hackathon (Oct 12-14), including re-training model, continuous integration, moderator UI enhancements, and documentation. MBARI publishes acoustic archive via AWS open data repository.

For more details, see the growing list of documentation pages for each Orcasound machine learning effort.

Deeper history of AI for Orcas project

Starting in the early 2000s, members of the Orcasound community have been contemplating the application of artificial intelligence to the problem of detecting orcas acoustically. Orcasound's AI for Orcas project page describes the evolution of our collective efforts. #ai4orcas