Corpus formats - NatLibFi/Annif GitHub Wiki
Annif uses different kinds of subject and document corpora.
- Subject vocabulary corpora define the set of possible subjects (concepts) that can be assigned to documents. These are typically SKOS or TSV files. See Subject vocabulary formats.
- Document corpora are collections of documents (with or without assigned subjects) used for training, evaluation, or testing. See Document corpus formats.