FinnTreeBank tagger - mpsilfve/FinnPos GitHub Wiki

FinnTreeBank tagger is a high accuracy morphological tagger and lemmatizer for Finnish. It is trained on FinnTreeBank 1 and uses the open-source Finnish morphological analyzer OMorFi.

See Build and install for instruction on installation.

The input format is one word/line where sentences are separated by newlines. For example

Koira
haukkuu
.

Kissa
naukuu
.

You can label sentences thus:

cat input | ftb-label > output

Sometimes OMorFi will return two lemmas corresponding to the same analysis. For example, "Helsingin" can be analyzed both as the singular genitive form of the proper noun "Helsinki" and the proper noun "Helsing". If you want all alternative lemmas, you can use the option --all-lemmas. For example

$ echo "Leijat Helsingin yllä ." | tr ' ' '\n' | ftb-label --all-lemmas
Leijat	_	leija	[POS=NOUN]|[NUM=PL]|[CASE=NOM]	_
Helsingin	_	helsinki|helsing	[POS=NOUN]|[PROPER=PROPER]|[NUM=SG]|[CASE=GEN]	_
yllä	_	yllä	[POS=ADVERB]	_
.	_	.	[POS=PUNCTUATION]	_

This will give you both alternative lemmas separated by a |: helsinki|helsing.

⚠️ **GitHub.com Fallback** ⚠️