JacyHinoki - delph-in/docs GitHub Wiki

Jacy is being developed in cooperation with the Hinoki Treebank.

Corpora

Name ID Full Name # Sentences # Words Comments
mrs 0 MRS Test Suite 136 ???
tc 100,000 Tanaka Corpus 150,341 1,756,825 Includes English Translations, 10 profiles (6-15) treebanked

These treebanks are in the jacy/tsdb/gold directory. They may lag behind the most recent version of the grammar.

If you want silver data, parsing the rest of the Tanaka Corpus is a good place to start.