XML - redewiedergabe/corpus GitHub Wiki
Folder: data/xml
This version of the corpus consists of TEI compliant XML files. Each file contains the text of one sample and the annotations of the project Redewiedergabe.
For a more detailed explanation of the annotation structure see Annotation structure.
- TEI header with general information
-
<fs>
tag with metadata in the format defined by the Redewiedergabe project (see Metadata) - The sample text contains the following XML tags:
<said>
(STWR annotation),<seg>
(frame/speaker/intExpr annotation),<note>
(footnote text),<p>
(paragraph; always covers the full sample text)
We created a RELAX-NG syntax schema that adapts the TEI Module for Linguistic corpora to the annotation schema of the project Redewiedergabe. The following paragraph lists the extensions we defined.
Schema and Documentation: data/xml/resources
Attribute | Values | Description |
---|---|---|
id | number | ID |
mode | direct, freeIndirect, indirect, reported, indirect freeIndirect | STWR type |
content | speech, thought, writing (and combinations, separated by whitespace) | STWR medium |
level | number (starts with 1) | nesting depth of this STWR annotation |
nonFact | yes/no | non-factual STWR, e.g. negated |
prag | yes/no | STWR with a different pragmatic function, e.g. rhetorical figures |
metaph | yes/no | metaphorical STWR |
border | unspec/state/percept | borderline cases of STWR (typically for thought representation), e.g. perceptions |
Attribute | Values | Description |
---|---|---|
id | number | ID |
pos | start, mid, end | if type=frame: position of the frame relative to its STWR |
The attribute type
of the <seg>
tag is used to specify which information the tag is used for. Options are: frame, speaker, intExpr.