AlvisNLP ML data model - Bibliome/alvisnlp GitHub Wiki
The data structure contains corpus contents and annotation. The data structure is passed from a module to the next one. Each module instance can access (read and write) it through a shared object.
The following figure presents an UML-like specification of the AlvisNLP/ML data structure.

-
Corpus: a
Corpusobject represents a collection of documents. In an AlvisNLP/ML run, the corpus is a unique object passed from module to module. ACorpusobject has features and documents. -
Document: a
Documentobject represents a single document. Each document has an identifier which is unique in the corpus. ADocumentobject has features and sections. -
Section: a
Sectionobject contains a piece of the document's text contents. Each section has a name, a contents, features, layers, and relations. -
Layer: a
Layerobject is an annotation container. ALayerobject has a name unique in the section. -
Annotation: an
Annotationobject represents a span of text created by a module. Each annotation is included in at least one layer. AnAnnotationobject has a start and end which represent the coordinates of the annotation in the section's contents, and features. -
Relation: a
Relationobject is a tuple container. ARelationobject has a name unique in the section and features. -
Tuple: a
Tupleobject represents a relation between several elements in the data structure. ATupleobject has several arguments, each argument is an element (Corpus,Document,Section,Relation, but most oftenAnnotationorTuple) accessible through a role name. ATupleobject also has features. -
Features are key-value pairs that contain information on an element type, tag or property. Feature keys are not unique in an element, though when accessing a feature key, the last value is returned.