Other training data: humpback whales - orcasound/orcadata GitHub Wiki

The "Haro Humpback" catalog & open annotations

Over the winter of 2021-2022, Emily Vierling worked with Val and Scott Veirs as a Beam Reach "extern" (mostly-remote internship during COVID) to describe humpback signals within the open data from Orcasound Lab hydrophones (Haro Strait, WA, USA). Leveraging her previous training with Helena Symonds and Paul Spong of OrcaLab, listening to humpbacks in Johnstone Strait (BC, Canada), Emily developed a new online Haro Humpback dictionary and annotated thousands of signals in Orcasound open data.

An open, collaborative humpback signal catalogue (catalog)

Presented first to the DCLDE 2022 workshop by Emily in spring, 2022, the catalogue (catalog) contains 12 signals that she found to be most common in recordings made primarily in the late fall (presumably of male humpback whales beginning to vocalize prior to leaving the Salish Sea for tropical wintertime habitat in Hawaii and/or Mexico). In addition to the 2022 version that Emily published via Wordpress, the catalogue is shared via the signal-catalogue Github repo where we hope new versions of code can be maintained to provide a generic tool to the bioacoustic community for building online and offline signal catalogues.

The 12 signal types in version 1.0 of this humpback signal dictionary are:

Whup
Grunt
Ascending Moan
Descending Moan
Moan
Upsweep
Trumpet
Growl
Creak
Buzz
Shriek
Chirp

Other sources of humpback song and call recordings

Fournet et al., 2018 digital supplement WAV files ( 1 | 2 | 3 | 4 ). Scott Veirs interprets these via Emily's catalogue as, respectively: (1) quickly repeated Whups; (2) distorted Whups (i.e. signal receive level saturated a hydrophone recording system); (3) two short Whups, about 5 seconds apart; (4) Whup-like sounds that are extended, like slow water droplet sounds.

Labeled data overview & attribution

Emily's annotated data includes ~9,000 labels and is based on ~YY hours of audio data from 3 days during October 03-28, 2021. These labeled data are part of Orcasound's AWS open data registry and are freely available under Orcasound's Creative Commons license (CC BY-NC-SA). Please attribute any use of the dictionary and/or labeled data to: "Emily Vierling, 2022, Orcasound" with a link back to orcasound.net.

Annotation procedure details

Emily used RavenPro for her annotation of Orcasound data as she was already familiar with the software from analysis of OrcaLab data. Orcasound now recommends open source tools like Audacity or the HALLO annotation tool for annotation tasks.

Q&A

Did you annotate every audible/visible humpback sound in the recordings, or only the "good" ones, i.e. only the ones you felt confident about, or only the ones detected by some algorithm?

I manually annotated every signal that was audibly clear and could be seen clearly on a spectrogram, so ones that I felt very confident about. I also tried to avoid, to the best of my ability, signals that were in the midst of a lot of background noise or had clipping in order to make sure they were of high quality.

Were there other, biological sounds in the files that you did not annotate? For example, killer whales, or fish? (If so, could you let us know which?)

There were some occasional fish 'burps' in the recordings that I did not annotate. I listened quite closely while I was annotating and was able to discern those from the humpback calls, and if I wasn't 100% sure it was a humpback call, I didn't include them as an annotation. There were some non-biological signals that showed up on the spectrogram that I did not annotate as well (hydrophone contact sounds, loud water conditions, etc.).

Did you draw the annotation boxes 'snuggly' around the calls? (i.e. as tight as possible while still including the entire call)

I drew the annotation boxes from the very beginning of the call to the end of it (where it was visible on the spectrogram) which encapsulated the entirety of the call, so boxing was as close as possible.

Where was the hydrophone data recorded?

All of the hydrophone data was recorded from the Orcasound Lab location. See below for more metadata, including site coordinates.

Audio and annotation metadata

Audio Data

License/data sharing agreement: Creative Commons license (CC BY-NC-SA)
Data owner / source: Orcasound
Location: Acoustic Sandbox (S3 bucket, part of the AWS open data registry)
No. files: 7
File length: 12-180 MB
Time range: 03 Oct 2021 - 28 Oct 2021
Dataset size: 993 MB
Description: Version 1 of Haro Humpback bioacoustic bouts for annotation by Emily Vierling (winter-spring 2022)
Coordinates: 48.55833, -123.17357 (Orcasound Lab)
Water depth: 8 meters
Format: FLAC
Codec: FLAC
Channels: 1
Sample Rate: 44.1 kHz
Filelist: s3://acoustic-sandbox/humpbacks/Emily-Vierling-Orcasound-data/Em_HW_data/flac_files/ | URL via Quilt
- 211026-133018-OS-humpback-47min-clip.flac (175.3 MB)
- OS_10_03_2021_19_34_00_.flac (160.3 MB)
- OS_10_28_2021_18_54_00_.flac (150.5 MB)
- OS_10_28_2021_1900_HB.flac (12.5 MB)
- OS_10_28_2021_19_24_00_.flac (153.2 MB)
- OS_10_28_2021_19_55_00_.flac (161.2 MB)
- OS_10_28_2021_20_25_00_HB.flac (180.1 MB)

Annotation Files

License/data sharing agreement: Creative Commons license (CC BY-NC-SA)
Annotator: Emily Vierling
Method (manual or semi-manual): manual
Detector (if applicable): N/A
Filelist: s3://acoustic-sandbox/humpbacks/Emily-Vierling-Orcasound-data/Em_HW_data/Annotations/ | URL via Quilt
Granularity (call, file, encounter): non-song vocalization
Resolution (species, ecotype, call type, etc.): species; possibly individual(s) in some cases, depending on sightings data
Columns (for each column provide description of content and possible values):
- Selection: sequential numbering within the annotation file for each labeled signal
- Begin Time (s): seconds into the recording when annotation bounding box begins _ End Time (s): seconds into the recording when annotation bounding box ends
- Low Freq (Hz): lower frequency bound of annotation bounding box
- High Freq (Hz): upper frequency bound of annotation bounding box
- Call Type: 12 non-song vocalization categories for "Haro Humpbacks" and humpbacks observed by Orca Lab in Johnstone Strait

Pre-processed deep/machine learning data set

We are also sharing the training data (s3://acoustic-sandbox/humpbacks/Emily-Vierling-Orcasound-data/Em_HW_Processed/) | URL via Quilt)that Val developed based on Emily's work. It includes fixed-window audio clips and associated spectrograms. Preliminary documentation of his efforts can be found in the signal-annotation Github repo.