Corplist XML - czcorpus/kontext GitHub Wiki

Corplist.XML

The corplist.xml file contains the definitions for a specific corpus. It defines some default behaviour as well as keywords and related interfaces.

Spoken corpora

A corpus with both audio and overlapping

<corpus
  sentence_struct="sp"
  speaker_id_attr="sp.oznacenishody"
  speech_segment="seg.soundfile"
  speech_overlap_attr="sp.prekryv"
  speech_overlap_val="ano" 
  ident="ORAL2013" />

A corpus without audio and without overlapping

(...but we still want a speech-based KWIC detail rendering)

<corpus 
  sentence_struct="sp" 
  speech_segment="sp." 
  speaker_id_attr="sp.num" 
  ident="ORAL2008" 
  tagset="pp_tagset" />

Please note the dot in "sp." asigned to "speech_segment". It tells KonText that there are speeches defined but there are attributes defining speech audio.