Note of Lecture 1 - miya93/CS224n-Natural-Language-Processing-with-Deep-Learning GitHub Wiki

Introduction and Word Vectors

1. Word Vectors

1.1 One-hot vector: Represent every word as an R|V|×1 vector with all 0s and one 1 at the index of that word in the sorted english language.

Drawbacks: We represent each word as a completely independent entity. As we previously discussed, this word representation does not give us directly any notion of similarity.
Solution: So maybe we can try to reduce the size of this space from R|V| tosomething smaller and thus find a subspace that encodes the relationships between words.

2. SVD Based Methods

For this class of methods to find word embeddings (otherwise known as word vectors), we first loop over a massive dataset and accumulate word co-occurrence counts in some form of a matrix X, and then perform Singular Value Decomposition on X to get a USVT decomposition. We then use the rows of U as the word embeddings for all words in our dictionary.

Word-Document Matrix
Window based Co-occurrence Matrix

3. Iteration Based Methods - Word2vec

word2vec

Try to create a model that will be ableto learn one iteration at a time and eventually be able to encode the probability of a word given its context