Note of Lecture 1 - miya93/CS224n-Natural-Language-Processing-with-Deep-Learning GitHub Wiki
Introduction and Word Vectors
1. Word Vectors
1.1 One-hot vector: Represent every word as an R|V|×1 vector with all 0s and one 1 at the index of that word in the sorted english language.
-
Drawbacks: We represent each word as a completely independent entity. As we previously discussed, this word representation does not give us directly any notion of similarity.
-
Solution: So maybe we can try to reduce the size of this space from R|V| tosomething smaller and thus find a subspace that encodes the relationships between words.
2. SVD Based Methods
For this class of methods to find word embeddings (otherwise known as word vectors), we first loop over a massive dataset and accumulate word co-occurrence counts in some form of a matrix X, and then perform Singular Value Decomposition on X to get a USVT decomposition. We then use the rows of U as the word embeddings for all words in our dictionary.
- Word-Document Matrix
- Window based Co-occurrence Matrix
3. Iteration Based Methods - Word2vec
Try to create a model that will be ableto learn one iteration at a time and eventually be able to encode the probability of a word given its context