Orca training data - orcasound/orcadata GitHub Wiki

There are many recordings of killer whales available, but relative to other marine mammal species, there is a paucity of labeled data. For example, many toothed whale (Odontocete) species are included in the Mobysound archive, but not yet southern resident killer whales (as of July, 2020).

This page documents the growing array of labeled data specific to killer whale ecotypes, with a primary focus on Southern Resident Killer Whales, and a secondary focus on other ecotypes of the Northeast Pacific Ocean. Open data sources, including those provided by Orcasound member organizations, are listed first to promote collaboration. Closed data sources are listed in the hope that they become available to the open-source and open-data communities in the future, or are otherwise valuable as reference points.

Note: these data are for training models. For test data, please refer to the orca test data wiki page.

Open data sources

Orcasound data

This section contains Orcasound data sets aimed at training machine learning models to detect and/or classify the signals of killer whales. The primary focus is on binary classification of any Southern Resident Killer Whales SRKW calls (yes/no for any call type), but labels may also indicate SRKW call type, whistles, or clicks, as well as associations with pods (J, K, and/or L), matrilines, and -- in rare cases -- isolated individuals. There are also some resources related to Bigg's or transient killer whales.

NOTE: To access these data you cannot use a browser. Instead note the URL and use the AWS Command Line Interface in a terminal window to access the public files. See the Data access via AWS CLI page to learn more about the AWS Command Line Interface. Many of these data are aggregated within the Orcasound "Acoustic Sandbox" (a public S3 bucket).

Other open labeled data sources

Closed or restricted data sources

Non-Orcasound data sources (not yet open, or licensing unclear; un/labeled)

  • SRKWs
    • Orca Behavior Institute (Monika Wieland), historic data from cabled Lime Kiln State Park
    • NOAA (Brad Hanson, Marla Holt, Candice Emmons): autonomous recorders on outer coast WA and DTAG deployments on SRKWs
    • ONC (Kristen Kanes, Science open data set in 2020?), cabled arrays on outer BC shelf (Barkley Canyon; and Georgia Strait? Early versions were not specific to ecotype?)
    • DFO (James Pilkington), mostly autonomous recorders on outer coast BC (mostly clips? may be specific to ecotype)
    • SMRU/TWM (Jason Wood), some labeled by Alex Harris (30,000 general KWs; 30,000 non-KWs)
    • JASCO (David Hannay? Ruth Joy?), 5 second clips
  • NRKWs
    • OrcaLab (Paul Spong, Helena Symonds), cabled near-shore hydrophones in Johnstone Strait, B.C.
      • Orchive (data archive by Steve Ness at UVic)
      • OrcaSPOT (Bergler et al. ML effort published in 2019)
      • Rachel Cheng mentioned referencing Steven Ness thesis for description of the Orchive data set. Christian Bergler used ~424 labeled pulsed calls from Paul & Helena to compare the unsupervised clustering method and supervised classification (2019). They were told that those labeled calls were used at the orcalab to train volunteers to recognize NRKW signals. More details about the call types can be found in the paper. Unfortunately, those labeled calls only cover matrilines frequently sighted in Johnstone Strait and are not a complete snapshot of the vocal repertoire of NRKW. (We wondered but did not yet clarify which clans and matrilines are represented in those labels.)
    • [Cetacean Research Technology recordings of Springer] (https://www.cetaceanresearch.com/sounds/springer-sounds.html) (A-pod juvenile A73, aka Springer) by Joe Olson between Seattle and Vashon Island near the ferry lanes. 19 January 2002 recordings when she was isolated from her natal pod in Puget Sound (believed to be the first recordings ever of an individual wild killer whale); exactly after her successful return to the NRKW community on 14 July 2002, Joe again recorded A73 -- this time on 14 July 2007 in Johnstone Strait, accompanied by other members of her extended family belonging to the A8, A11, and A12 subpods. (2023 archive of CRT page and audio files)
    • David Bain recordings of A73 in Puget Sound during winter/spring, 2002?
    • Pacific Wild unlabeled archive (Soundcloud), cabled near-shore hydrophones in central B.C., near Bella Bella.
  • Alaska residents
    • OrcaCNN (Dan Olsen), many recordings from autonomous and boat-based hydrophone recording systems
  • Bigg's (transients)
    • No labeled data (to our knowledge)
    • Raw data sources:
      • Alaskan transients (via Dan Olsen and Hannah Myers)
        • Many recordings of AT1s (only 7 individuals left; unique sounding calls relative to other transients)
        • Gulf of Alaska transients (need to be digitized)
  • California recordings (ecotype uncertain unless in a sub-section)
  • Antarctic ecotypes