Install TreeTagger - presemt-ntnu/transglobal GitHub Wiki
This step can be skipped if you have downloaded a copy of the local data and do not want to build to your own local data
TreeTagger is required for part-of-speech tagging and lemmatization for several languages, currently English and German. Follow the instructions on how to install TreeTagger. Note that binaries and parameters files differ according to your platform (Mac OS X, Linux).
English
Install the English parameter file with Latin1 character encoding and Penn Treebank tag set).
German
There are two options:
-
Install the German parameter file with UTF-8 character encoding: This does not work out of the box and therefore requires some extra work. Follow the instructions on this page. The advantage is that the tagger can handle any input with UTF-8 characters and will therefore not mangle parts of the input.
-
Install the German parameter file with UTF-8 character encoding: This works out of the box and is therefore easy. The disadvantage is that the tagger cannot handle properly input with non-latin1 characters and may therefore mangle parts of the input. For this option, you need to change the default setting in the configuration file located in
env/tg-default.cfg
from
[tagger] # tagger/lemmatizer
[de](/presemt-ntnu/transglobal/wiki/de)
command = tree-tagger-german-utf8
encoding = utf-8
# Use the following for non-utf8:
# command = tree-tagger-german
# encoding = latin1
to
[tagger] # tagger/lemmatizer
[de](/presemt-ntnu/transglobal/wiki/de)
#command = tree-tagger-german-utf8
#encoding = utf-8
# Use the following for non-utf8:
command = tree-tagger-german
encoding = latin1