Models - mikahama/uralicNLP GitHub Wiki
What models are there?
UralicNLP can currently use three different kinds of models: HFST morphological generator, HFST morphological analyser and constraint grammar disambiguator. The HFST models are available for all the supported languages, while the CGs are limited to only a few languages.
The models originate mostly from the Giellatekno repository and Apertium. Their copyrights belong to the respective authors, however everything provided by Giellatekno and Apertium is open source.
Downloading models
from uralicNLP import uralicApi
uralicApi.download("fin")
The above snippet downloads all the models for Finnish. Run with sudo privileges for a system wide installation.
Where are models located?
from uralicNLP import uralicApi
print uralicApi.__model_base_folders()
Gives you the list of the possible locations for the models. If you want to create your own models, just create a subdirectory in any of these locations by the three letter language code of your language. Name your models as generator, analyser and cg without file extensions.
Uninstalling models
If you want to free up some space, or end up getting confused which models will be loaded when uralicNLP is used, you can also uninstall models easily
from uralicNLP import uralicApi
uralicApi.uninstall("fin")
Using your own transducers
It is possible to use your own transducer file on uralicNLP by passing a filename parameter
from uralicNLP import uralicApi
uralicApi.generate("kissa+N+Pl+Nom", "fin", filename="/path_to_your/transducer.hfstol")
uralicApi.analyze("kissat", "fin", filename="/path_to_your/transducer.hfstol")
uralicApi.lemmatize("kissat", "fin", filename="/path_to_your/transducer.hfstol")
Model info
Use uralicApi.model_info(language) to see information about the FSTs and CGs such as license and authors. If you know how to make this information more accurate, please don't hesitate to open an issue on GitHub.
from uralicNLP import uralicApi
uralicApi.model_info("fin")
Access the HFST transducer
If you need to get a lower level access to the HFST transducer object, you can use the following code
from uralicNLP import uralicApi
sms_generator = uralicApi.get_transducer("sms", analyzer=False) #generator
sms_analyzer = uralicApi.get_transducer("sms", analyzer=True) #analyzer
The same parameters can be used here as for generate() and analyze() to specify whether you want to use the normative or descriptive analyzers and so on. The defaults are get_transducer(language, cache=True, analyzer=True, descriptive=True, dictionary_forms=True).