POS (Part of Speech) tagging - SoojungHong/TextMining GitHub Wiki

Reference : https://nlpforhackers.io/training-pos-tagger/

Part-Of-Speech tagging (or POS tagging, for short) is one of the main components of almost any NLP analysis. The task of POS-tagging simply implies labelling words with their appropriate Part-Of-Speech (Noun, Verb, Adjective, Adverb, Pronoun, …).

The most popular tag set is Penn Treebank tagset. Most of the already trained taggers for English are trained on this tag set. Examples of such taggers are:

  • NLTK default tagger
  • Stanford CoreNLP tagger

You can build simple taggers such as:

  • DefaultTagger that simply tags everything with the same tag
  • RegexpTagger that applies tags according to a set of regular expressions
  • UnigramTagger that picks the most frequent tag for a known word
  • BigramTagger, TrigramTagger working similarly to the UnigramTagger but also taking some of the context into consideration