NER Editor HMGM - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki

  1. AMP: Audiovisual Metadata Platform
  2. Documentation
  3. For Collection Managers
  4. MGMs (Metadata Generation Mechanisms)

NER Editor HMGM

Named entity recognition (NER), or entity extraction, is a type of natural language processing (NLP) that attempts to identify and classify entities or concepts, like people, places, organizations, products, and topics, in unstructured text. The AMP tool implementations take in outputs from speech-to-text, HMGM transcript editor, and video OCR. Data outputs include entity text, entity type, character offsets, start time, and relevance score (if offered by the tool).

Inputs

  • amp_transcript file (or one of its variations: amp_transcript_adjusted, amp_transcript_aligned, or amp_transcript_corrected) from a Speech-to-text MGM or a text file uploaded as an intermediary file

Output Formats

  • amp_entities_corrected: Entities in AMP JSON format corrected

NER Editor in AMP

Avalon Timeliner

Avalon Timeliner is an open-source A/V annotation and analysis tool for creating and labeling segments of time-based media. The segmentation can be used to navigate A/V media for detailed study. AMP uses the "marker" function of the timeliner to mark time locations in the media file for the Entities recognized by the NER tool. The human reviewer of the NER output uses the marker features to correct or remove entries and also to add new ones.

Notes on Use

  • The task managament tool set at the Unit level where the content file is located is used by HMGM tools to create work tickets for the reviewers. The workflow, then, sits at this node until the review is completed in AMP. See Transcript Editor HMGM for details.\

Use Cases and Example Workflows

Use Case: Correcting a Transcript for NER generation

A collection has already generated and corrected a transcript and would like to generate and correct a list of named entities. 

AMP JSON Output

See Named Entity Recognition - NER for format detailed information.

Attachments:

TRANSCRIPT_NER_WORKFLOW.png (image/png)

Document generated by Confluence on Feb 25, 2025 10:39