Forced Alignment - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki

  1. AMP: Audiovisual Metadata Platform
  2. Documentation
  3. For Collection Managers
  4. MGMs (Metadata Generation Mechanisms)

Forced Alignment

Forced Alignment is the process that, based on audio and a text transcript of the audio, adds timestamps to the transcript.\

Inputs

  • Audio (mp3, wav, possibly other formats) 
  • Corresponding transcript in a text format with no time codes.

Output Formats

  • Gentle Transcript (json) - Transcript with time codes in the Gentle delivered json format.
  • AMP Transcript Aligned (json) - Aligned transcript in the AMP JSON format.

MGMs in AMP

Gentle Forced Alignment

The Gentle Forced Alignment MGM takes an AMP transcript as input and the audio file related to the item to generate the AMP transcript output with updated time codes.

Parameters: 

  • Audio (mp3, wav, possibly other formats) and transcript (plain text).

Notes on Use

  • In AMP, this tool was created to correct time codes of a transcript that went through the Human MGM for correction because the BBC transcript editor used in the correction process results in corrected speech with wrong time codes.\

Use Cases and Example Workflows

Realigning a transcript with misaligned time codes

An item had the transcript corrected by a human using the BBC transcript editor. During this correction process, the editor had to add several chunks of speech, which the BBC editor did not align with time codes. The CM wants the resulting transcript to go through Forced Alignment to correct the problems.

AMP JSON Output

Schema


Sample output

Sample Output


Attachments:

Forced alignment workflow.png (image/png)\

Document generated by Confluence on Feb 25, 2025 10:39