Lemmatizer Design Algorithm Notes - Hedera-Lang-Learn/hedera GitHub Wiki

Django App

  • Python protocol: module configured in settings.py

Different lemmatizers:

  • Helma Dik's info
  • Morpheus
  • LemLat

Lemmatizer Algorithm:

When new text comes in distinguish between levels of certainty:

  1. text pasted might have already been lemmatized.
  2. text fragments (n-gram), e.g., three-four word chunks have already been lemmatized.

Find two or three examples and work off of them?

How will we deal with phrases/ set expression? How will this work with lattice? How will we deal with non-contiguous phrases?

Interface will have to support breaking up phrases (or creating them?).