Home - PeppermintT/Learning-NLP GitHub Wiki

Python for Natural Language Processing - ch 1

Core concepts

  1. Concordance A concordance view shows us every occurrence of a given word, together with some context - a handful of words either side.

  2. Common contexts

  3. Dispersion plot This determines the location of a word in the text: how many words from the beginning it appears. This shows what comes throughout the text and what words periodically. It is probably most useful when comparing two or more texts.

  4. Lexical diversity A collocation is a sequence of words that occur together unusually often. Eg white wine. Collocations are resistant to substitution. For example we don't see "magnolia wine".

Lexical richness can be measured by taking the len(of a text)/ len(set(text)). Basically set() is used to identify the unique words or items.

  1. Frequency distributions Examples:
  • fdist = FreqDist(samples) create a frequency distribution containing the given samples
  • fdist['monstrous'] count of the number of times a given sample occurred
  • fdist.freq('monstrous') frequency of a given sample
  • fdist.N() total number of samples
  • fdist.most_common(n) the n most common samples and their frequencies
  • for sample in fdist: iterate over the samples
  • fdist.max() sample with the greatest count
  • fdist.tabulate() tabulate the frequency distribution
  • fdist.plot() graphical plot of the frequency distribution
  • fdist.plot(cumulative=True) cumulative plot of the frequency distribution

6) Language Understanding Technologies - Overview

a) Word sense disambiguation In word sense disambiguation we want to work out which sense of a word was intended in a given context. (Eg serve: help with food or drink; hold an office; put ball into play).In a sentence containing the phrase: he served the dish, you can detect that both serve and dish are being used with their food meanings.

b) Pronoun resolution - who did what to whom?

If we solve questions relating to language understanding we can work towards lanaguage outputs and machine translation.

c) Machine Translation The roots of oots go back to the early days of the Cold War, when the promise of automatic translation led to substantial government sponsorship, and the start of NLP. There are still lots of short coming - the textbook contains an example of taking a sentence in English and translating it back and forth from German, where it quickly becomes non sensical. The task is difficult because word order must be changed in keeping with the grammatical structure of the target language. Parallel texts are used from news and government websites that publish documents in two or more languages. Given a document in German and English, and possibly a bilingual dictionary, we can automatically pair up the sentences, a process called text alignment.

d) Texual entailment - we will come back to this in more detail. In a nutshell there are severe limitations on the reasoning and knowledge capabilities of natural language systems.