Recogito Tutorial: NER algorithms - pelagios/pelagios.github.io GitHub Wiki
When working on a text document, Recogito offers you the chance to use various Named Entity Recognition (NER) algorithms to parse your uploaded text, and pre-highlight references to places and persons. In the case of places, Recogito will also try to match the place-names to entries in the available gazetteers. This automatic annotation cannot be expected to be thorough nor completely reliable. Some places won’t be identified, some will be confused for people or other entities, and some will be associated to the wrong gazetteer. But this is still a powerful tool to speed up the annotation process, and create provisional visualisations. You can always change and refine the automatic annotations manually, either one by one or using the bulk mode. In addition, all annotations created by NER algorithms will show as “unverified”: to turn them “green”, as ever, requires a user to confirm them either way
To start the automatic annotation option, go to the “My Documents” area and select the document of your choice by clicking on it once. The “Options” button will appear in the top right corner. From that menu, among various features that we mentioned under Document Options, select “Named Entity Recognition”. This will bring you to another interface, and ask you to make two choices, dictated by the nature of the document you are annotating.
- Select a Recognition Engine: NER algorithms are language-specific. It increases the effectiveness of the automatic annotation if you select the appropriate version of the NER. Currently, we don’t have NER for all languages, but we are constantly working with our community to expand the options. If you can’t find the language of your document, try with the one that sounds more similar. For example, if your document is in Dutch, you may want to try the German algorithm.
- Select the relevant gazetteers: Recogito offers a number of available gazetteers that are different in coverage and scope. They may not all be relevant to the annotation of your document of choice, and you may want to limit the automatic matching of the placenames only to the gazetteers that are pertinent. By default, Recogito will use all the available gazetteers: however, if you disable the “identify against all available authorities” option, you will be allowed to select only the most appropriate gazetteers. If you want to read more about our gazetteers options check the Annotation Preferences section.