pos_tagger_desc - apache/ctakes GitHub Wiki

This project provides a UIMA wrapper around the popular OpenNLP part-of-speech tagger. The UIMA examples project provides a default wrapper from which we have borrowed liberally. We have created our own wrapper so that it will work better with our type system and to add features and supporting components.
Additionally, both the OpenNLP package and the UIMA examples OpenNLP wrappers lack documentation for how to do things like generate training data, build a part-of-speech tagging model, and build a tag dictionary. The latter in particular can be very confusing if you are new to OpenNLP. We have attempted to provide all of the necessary documentation here.

A part-of-speech tagging model is included with this project.

The model derives from a combination of GENIA, Penn Treebank (Wall Street Journal) and anonymized clinical data per Safe Harbor HIPAA guidelines. Prior to model building, the clinical data was deidentified for patient names to preserve patient confidentiality. Any person name in the model will originate from non-patient data sources.