TLA - langdoc/FRechdoc GitHub Wiki
TLA documentation
This page documents conventions, standards and relevant workflows used for archiving multimedia corpus data created by The Freiburg Research Group in Saami Studies in collaboration with other partners.
Intro
The Language Archive (TLA) is a unit of the Max Planck Institute for Psycholinguistics in Nijmegen/The Netherlands working, among other things, with archiving resources on endangered languages and cultures. Our documentation projects archive video and audio resources along with metadata at TLA.
- Link to IKDP
- Link to KSDP
- Link to PSDP
Workflow for Archiving at TLA
Following command can convert MTS file into the format accepted by TLA:
ffmpeg -i 00004.MTS -c:v libx264 -b:v 15M -acodec aac -b:a 192k 00004.mp4
Data Standards for Archiving at TLA
- Raw data
- Edit decision list in a machine readable format
- This we can't really provide, but in principle it can be reconstructed up a point from Final Cut Pro XML
- Catalogue metadata
- Content metadata
Currently it is not possible to archive all video raw data to TLA since those video formats produced by the cameras are usually not acceptable for archiving. Video formats accepted are MPEG-1 and MPEG-2, with MPEG-4 being also usable when hinting track has been added. Question: What is the best way to add hinting track to MPEG-4 files with open source tools (ffmpeg or MP4Box) in a way that is accepted by TLA?
Access to Data Archived at TLA
Access to catalogue metadata at TLA is always free, although personal information on actors may be anonymized.
Raw data and content metadata (e.g. annotations) belongs to one of the following four types:
- Open Resources
- Access is free (marked green)
- Restricted Open Resources
- Access is free after registration (marked yellow)
- Protected Open Resources
- Access can be requested (marked orange)
- (Closed Resources)
- No access (marked red)
Only the first three access levels are relevant to our data.