Gentle Forced Alignment - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki

  1. AMP: Audiovisual Metadata Platform
  2. Documentation
  3. For Developers
  4. MGM Adapters

Gentle Forced Alignment

  • About
    • The Forced Alignment tool is a wrapper around the python tool and API Gentle
  • Source Code
    • AMP's fork of the Gentle repo: https://github.com/AudiovisualMetadataPlatform/gentle
      • Only a slight change was made to the Kaldi installation bash script. 
    • Singularity container to build the Gentle tool: https://github.com/AudiovisualMetadataPlatform/gentle-singularity
      • Singularity wrapper around gentle removes the numerous dependencies in the build process.  It also removes the need to have available port open for the API.   
    • /srv/amp/gentle-singularity/gentle-singularity.sif: Singularity container file for the forced alignment code with all dependencies needed built-in
    • galaxy/tools/amp_stt/gentle_forced_alignment.py: Python wrapper script to run forced alignment in the singularity container\
  • Dependencies
    • All dependencies are included in the singularity sif file, no extra installation needed.\
  • Usage: See details on how to install, build, and run @https://github.com/AudiovisualMetadataPlatform/gentle      \
  • Parameters
    • $input_audio_file: Input audio file in wav format
    • $input_transcript_file: Input transcript file in the form of AMP STT Json
  • Output
    • amp_transcript: JSON file in AMP Transcript format
  • Notes
    • In some instances, words in the input transcript could not be found.  It produces a json node like this:

      ::: table-wrap

      | ##### {    "case": "not-found-in | | -audio",    "endOffset": 60941,    "startOffset": 60937,    "w | | ord": "type" } {#GentleForcedAlignment-{"case":"not-found-in-a | | udio","endOffset":60941,"startOffset":60937,"word":"type"}} |

      :::

    • To accommodate for this, with input from the MGM team, we implemented an algorithm which checks to see how far in the transcript the next "successful" match was

      • We take the add the average time ((Next Success Start - Last Success End)/# of words away) to the previous words to get our new start/end time for the unfound words.

Document generated by Confluence on Feb 25, 2025 10:39

⚠️ **GitHub.com Fallback** ⚠️