Output Description File - VincTheSecond/rextractor GitHub Wiki

XML document describes entities and relations detected in the input document. Each entity is defined as list of text-chunks (element annotation in HTML document) and each relation is defined as a set of entities. Here is description of elements used in XML document:

document
- Root element.
- No attributes.
document → metadata
- Document metadata section. So far undefined.
- No attributes.
document → entities
- List of entities detected in the input document.
- No attributes.
document → entities → entity
- Description of one entity detected in the document.
- Attributes:
  - entity_id
    - Unique id of the entity. This is is used for entity addressing in relation definitions.
  - dbe_id
    - Identifier of entry in the Database of Entities. Defined if entity was detected in Entity component.
  - chunk_ids
    - List of chunks (element in HTML document) which creates the entity.
document → entities → entity → dependency_tree
- Definition of the dependency tree for the entity. Each token of the entity has defined its node in the tree. Each node (element ) is descripted by several attributes:
  - form
    - Original form of the token.
  - lemma
    - Base form of the token.
  - ord
    - Ord number of the token in the entity.
  - parent
    - Ord number of the parent token.
document → relations
- List of detected relations
- No attributes.
document → relations → relation
- Description of one specific relation.
- Attributes:
  - relation_id
    - Unique relation identifier
  - dbr_id
    - Id of the query which describes relation and its RDF representation.
  - (subject|predicate|object)_ids
    - List of entity ids which are on the position of the $1.
  - (subject|predicate|object)_concept
    - Concept of the entity used in RDF transformation.

Output Description File - VincTheSecond/rextractor GitHub Wiki

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️