Python scripts to use with Jorkens - mcthulhu/jorkens GitHub Wiki

Jorkens exposes three or four plain text files that can be used by Python scripts: bookText.txt, chapterText.txt, and tokens.txt, which should list the current tokens/words in the current chapter, one per line. If text has been selected, you'll also see selection.txt.

stanza-lemmatizer.py, which should be placed in the Documents/Jorkens/Python folder, allows Jorkens to use Stanford's Stanza lemmatizer to supplement or replace TreeTagger (Stanza supports more languages). To take advantage of this you should first install stanza ("pip install stanza" at the command line) and then in Python download the language data for the languages you want to use, by entering the commands "import stanza" and then "stanza.download('es')" (e.g. for Spanish) at the Python prompt. You will need to know the correct language code for your language.

multi-rake.py is a Python script to extract key phrases from text; in this example, the language is hardcoded to 'es' and the number of results to be returned is set at 50. You can edit the script to suit your needs.

stanza-syntactic-dependency-parsing.py is a Python script to show the syntactic relationship among parts of a sentence.