Lemmatizer Design Algorithm Notes - Hedera-Lang-Learn/hedera GitHub Wiki
Django App
- Python protocol: module configured in settings.py
Different lemmatizers:
- Helma Dik's info
- Morpheus
- LemLat
Lemmatizer Algorithm:
When new text comes in distinguish between levels of certainty:
- text pasted might have already been lemmatized.
- text fragments (n-gram), e.g., three-four word chunks have already been lemmatized.
Find two or three examples and work off of them?
How will we deal with phrases/ set expression? How will this work with lattice? How will we deal with non-contiguous phrases?
Interface will have to support breaking up phrases (or creating them?).