N gram - sagr4019/ResearchProject GitHub Wiki
N-gram
N-grams take part in text processing. Texts get splitted into fragments (e. g. words in a sentence or letters of words). Those fragments are used together in a row with a specified number of fragments (n). N-grams are used to analyze texts and to predict the next fragment. They are used in many categories such as machine translation or spelling correction.
- n=1 is called "unigrams"
- n=2 is called "bigrams"
- n=3 is called "trigrams"
- n=4, n=5, ... are usually called four-grams, five-grams, ...
E. g. given a sentence "To be or not to be.".
Bigrams (n=2) would look like this:
"To be"
"be or"
"or not"
...
Trigrams (n=3) would look like this:
"To be or"
"be or not"
"or not to"
...
References
https://de.wikipedia.org/wiki/N-Gramm
https://en.wikipedia.org/wiki/N-gram
http://text-analytics101.rxnlp.com/2014/11/what-are-n-grams.html