TLA - langdoc/FRechdoc GitHub Wiki

TLA documentation

This page documents conventions, standards and relevant workflows used for archiving multimedia corpus data created by The Freiburg Research Group in Saami Studies in collaboration with other partners.

Intro

The Language Archive (TLA) is a unit of the Max Planck Institute for Psycholinguistics in Nijmegen/The Netherlands working, among other things, with archiving resources on endangered languages and cultures. Our documentation projects archive video and audio resources along with metadata at TLA.

  • Link to IKDP
  • Link to KSDP
  • Link to PSDP

Workflow for Archiving at TLA

Following command can convert MTS file into the format accepted by TLA:

ffmpeg -i 00004.MTS -c:v libx264 -b:v 15M -acodec aac -b:a 192k 00004.mp4

Data Standards for Archiving at TLA

  • Raw data
  • Edit decision list in a machine readable format
    • This we can't really provide, but in principle it can be reconstructed up a point from Final Cut Pro XML
  • Catalogue metadata
  • Content metadata

Currently it is not possible to archive all video raw data to TLA since those video formats produced by the cameras are usually not acceptable for archiving. Video formats accepted are MPEG-1 and MPEG-2, with MPEG-4 being also usable when hinting track has been added. Question: What is the best way to add hinting track to MPEG-4 files with open source tools (ffmpeg or MP4Box) in a way that is accepted by TLA?

Access to Data Archived at TLA

Access to catalogue metadata at TLA is always free, although personal information on actors may be anonymized.

Raw data and content metadata (e.g. annotations) belongs to one of the following four types:

  • Open Resources
    • Access is free (marked green)
  • Restricted Open Resources
    • Access is free after registration (marked yellow)
  • Protected Open Resources
    • Access can be requested (marked orange)
  • (Closed Resources)
    • No access (marked red)

Only the first three access levels are relevant to our data.