Orca ML teams & pages - orcasound/orcadata GitHub Wiki

Documentation of machine learning efforts related to the Orcasound project

These pages contain detailed documentation of distinct machine learning efforts. Each page should include at least a description of the effort with links to the underlying metadata, data, algorithms, and results. The long-term goal of this documentation is more open source sharing and reproducibility.

  • Pod.Cast (Microsoft team: Akash Mahajan, Prakruti Gogia, and Nithya Govindarajan)

  • OrcaHello (Microsoft team led by Chris Hanke and Dave Bain)

  • Detecting sounds other than orca calls on the realtime stream.

    • humpback calls
      • some progress has been done applying a trained algorithm (see discussion), but requires verification, possibly with some training data from the Humpback catalogue
    • orca echolocation clicks (see this issue)
      • @veirs, @pmdnhd can provide more suggestions on algorithms and testing datasets
    • ships
      • a small subset of ship annotations exist on the S3 bucket extracted from the OrcaHello annotations:
        • s3://acoustic-sandbox/acoustic-separation/dataset/
      • you can look at the ambient-noise-analysis project focusing on extracting power spectrum stats to characterize ambient noise

There are a variety of other sounds in the orcasound stream that we can build classifiers for. For some of them the training data is not rich so we can start with existing algorithms and see where the go wrong by applying them to the current data stream. For that we have started building an infrastructure to apply any sort of algorithms to the raw data through Github Actions. You can start by looking at this issue.