INA Speech Segmenter - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki

  1. AMP: Audiovisual Metadata Platform
  2. Documentation
  3. For Developers
  4. MGM Adapters

INA Speech Segmenter

  • About
    • The InaSpeechSegmenter is a python based tool that takes a binary file as input and products a list of audio segments. Each segment contains a start, end, and label.
  • Source Code
    • galaxy/tools/ina_speech_segmenter/ina_speech_segmenter.xml
      Tool configuration detailing tool execution, input file, output file, and labeling.
    • galaxy/tools/ina_speech_segmenter/ina_speech_segmenter_wrapper.py
      Python script to call speech segmenter via API and conform json to schema
    • galaxy/tools/ina_speech_segmenter/segmentation_schema.py
      Set of classes representing the segmentation schema\
  • Dependencies
  • Installation:
            $ pip install tensorflow-gpu # if you wish GPU implementation (recommended if your host has a GPU)
            $ pip install tensorflow # for a CPU implementation
            # install framework and dependencies
            $ pip install inaSpeechSegmenter\
  • Running the tool
    • The tool can be invoked from Galaxy UI as other tools. User needs to use Get Data / Upload from computer tool to ingest the input file into Galaxy before running the tool.
      When ingesting, choose binary (the default) as file format. The file then will be copied into a designated location in Galaxy file system.\
  • Parameters
    • $input_file: the audio file to run the segmentation on.
    • $json_file: the output json file.

Document generated by Confluence on Feb 25, 2025 10:39

⚠️ **GitHub.com Fallback** ⚠️