Annotation structure - redewiedergabe/corpus GitHub Wiki

This is a short overview over the annotations used in the "Redewiedergabe" corpus, their structure and names.

To really understand the meaning and usage of these categories, we strongly recommend consulting the detailed annotation guidelines on our project homepage (in German).

Spelling and formatting of the attribute names differ slightly in the output formats column-based text format and XML format (cf. documentation of these formats), but this page explains the general structure and caveats.

Note (Footnote text)

The samples of the "Redewiedergabe" corpus sometimes contain footnote text that interrupts the main text. Those are marked with the annotation note. This structural annotation was copied from the original full texts in order to ensure that the footnote text can be separated from the main text if necessary.

Warning: Footnote text can interrupt sentences! Sentences splitting information (in the column-based format) is not correct in these cases.

Annotation STWR (Speech, Thought, Writing Representation)

Main attributes

These attributes are obligatory for each STWR annotation.

Attribute Values Possible combined values Description
type direct, freeIndirect, indirect, reported indirect_freeIndirect STWR type
medium speech, thought, writing speech_thought, speech_writing, thought_writing, speech_thought_writing STWR medium
id Number ID; refers to the IDs of frame, speaker and intExpr and can also link interrupted STWR annotations (e.g. a direct STWR that is separated by a frame).
level Number (starts with 1) nesting depth of this STWR annotation; value=1 means highest level

Note: The combined values are separated by _ in the column-based format, but by whitespace in the XML format.

Secondary attributes

These attributes are optional.

Attribute Values Description
non-fact yes non-factual STWR, e.g. negated
border state, percept, unspec borderline cases of STWR (typically for thought representation), e.g. perceptions
prag yes STWR with a different pragmatic function, e.g. rhetorical figures
metaph yes metaphorical STWR

Annotations frame, speaker and intExpr

Attribute Values Description
id Number ID; links the annotations to each other as well as to one or more STWR annotations
pos start, mid, end only for frame: position of the frame relative to its STWR; start=before the STWR, mid=interrupts the STWR, end=after the STWR

In rare cases, frame annotations can appear without a linked STWR, if they are positioned at the end of a sample. In these cases, the ID has value zero.

Speaker annotations can have more than one ID, if they are associated with several different STWR annotations.

Frame annotations can only be linked to STWR annotations with the types direct or indirect. IntExpr annotations can additionally be linked to STWR annotations with the type reported. Speaker annotations can be linked to STWR annotations of any type.

IntExpr annotations only appear within frame annotations or within STWR annotations of the type reported.

Generally, each frame annotation is linked to one IntExpr and one Speaker annotation. However, there are exceptions to this rule:

  • There can be several IntExpr annotations for one frame annotation. These are either coordinated elements ("er bat und bettelte") or parts of a phrase or a complex verb that are separated ("er rief laut aus").
  • There can be several Speaker annotations for one frame annotation. These are coordinated elements (e.g. several different people).
  • There may be frame annotations that have neither Speaker nor IntExpr (if those could not be identified).

Additional structural remarks

  • STWR annotations are often nested. In the beta release, the maximum nesting depth is level=5 (one occurance).
  • In very rare cases, Frame annotations can also be nested (one occurance in the beta release).