Corpus formats - NatLibFi/Annif GitHub Wiki

Annif uses different kinds of subject and document corpora.

  • Subject vocabulary corpora define the set of possible subjects (concepts) that can be assigned to documents. These are typically SKOS or TSV files. See Subject vocabulary formats.
  • Document corpora are collections of documents (with or without assigned subjects) used for training, evaluation, or testing. See Document corpus formats.

← System requirements | Project configuration →