NLP - stereoboy/Study GitHub Wiki

Contents

Project 01

AIND-NLP: https://github.com/stereoboy/AIND-NLP.git

  • Text Processing
    • Cleaning
      • from bs4 import BeautifulSoup
    • Normalization
      • Lower() & Punctuation Removal
    • Tokenization
      • NLTK: Natural Language ToolKit
      • Named Entity Recognition
    • Stemming & Lemmatization

NLP-Exercises: https://github.com/stereoboy/NLP-Exercises.git

Viterbi Algorithm

  • Dynamic Programming to get an optimal path with the highest proboblity

Further Reading: Language Processing by Daniel Jurafsky and James H. Martin.

Main Project: HMM-Tagger

Hidden Markov Model Part of Speech tagger project

Project 02

Lesson 01: Feature extraction and embedding

  • Keywords
    • Bag of wors/TF-IDF
    • One-hot-encoding
    • Word Embeddings/Word2Vec/GloVe
    • t-SNE

Lesson 02: Topic Modeing

Lesson 05: Deep Learning Attention

Super interesting computer vision applications using attention:

NLP Application: Google Neural Machine Translation

The best demonstration of an application is by looking at real-world systems that are in production right now. In late 2016, Google released the following paper describing Google’s Neural Machine Translation System:

This system later went into production powering up Google Translate.

Take a stab at reading the paper and connecting it to what we've discussed in this lesson so far. Below are a few questions to guide this external reading:

  • Is the Google’s Neural Machine Translation System a sequence-to-sequence model?
  • Does the model utilize attention?
  • If the model does use attention, does it use additive or multiplicative attention?
  • What kind of RNN cell does the model use?
  • Does the model use bidirectional RNNs at all?

Lesson 6: RNN Keras Lab, Deciphering Code with Character-Level RNN

Main Project: Machine Translation

Project 03

Lesson 3: Speech Recognition

04 References: Signal Analysis

07 Feature Extraction

  • Feature Extraction: A summary of methods used in ASR:
  • Mel Scale

The Mel Scale was developed in 1937 and is based on human studies of pitch perception. At lower pitches (frequencies), humans can distinguish pitches better. Read more about it in Wikipedia (https://en.wikipedia.org/wiki/Mel_scale)

  • The Source/Filter Model

  • MFCC

    Mel Frequency Cepstrum Coefficient Analysis is the reduction of an audio signal to essential speech component features using both mel frequency analysis and cepstral analysis. The range of frequencies are reduced and binned into groups of frequencies that humans can distinguish. The signal is further separated into source and filter so that variations between speakers unrelated to articulation can be filtered away. The following reference provides nice visualizations of the process of audio->spectrogram->MFCC:

  • MFCC Deltas and Delta-Deltas

    Intuitively, it makes sense that changes in frequencies, deltas, and changes in changes in frequencies, delta-deltas, might also be meaningful features in speech recognition. The following succinct tutorial for MFCC's includes a short discussion on deltas and delta-deltas:

09 Phonetics

  • Phoneme/Grapheme
  • Lexical Decoding
  • Lexicon

12 Voice Data Lab

14. Acoustic Models and the Trouble with Time

  • DTW (Dynamic Time Warping)
  • CTC (Connectionist Temporal Classification)

19. References: Traditional ASR

A bit of Computer History Museum nostalgia on Speech Recognition presents what we think of now as "Traditional" ASR:

22. Connectionist Temporal Classification

23. References: Deep Neural Network ASR

Main Project: DNN Speech Recognizer