Questions Word2vec - ufal/NPFL095 GitHub Wiki

Questions Word2vec

  1. Table 1 gives an overview of relationship types used in evaluation.
    Would you expect to find the pair (kind, mean) in the "Opposite" type? Why? Why not?

  2. Which word would you expect to be the closest to the result of calculation vector("asymmetric") - vector("disrespectful") + vector("respectful")?
    Bonus: check your guess at http://epsilon-it.utu.fi/wv_demo/ (select English GoogleNews Negative300, use the Word analogy form).

  3. Let's have vocabulary of V=4 words (loves, John, king, Mary) and a sentence s "John loves Mary", s = (w1, w2, w3). Thus the 1-of-V (aka "one-hot") encoding of the words in s is

w1 = (0, 1, 0, 0)
w2 = (1, 0, 0, 0)
w3 = (0, 0, 0, 1)

Let's have an input projection matrix IN with dimension V*D, where D=3, IN =

2.0   0.5  -0.5
0.2  -0.5   0.3
0.1  -0.1   0.4
0.8   0.5  -0.3

Let's have an output projection matrix OUT, |OUT|=D*V, OUT=

ln(4)  0       ln(2)   ln(3)  
ln(8)  0      -ln(8)  -ln(3)  
ln(8)  ln(4)   ln(8)   ln(3)  

where ln is the natural logarithm, ln(x)=loge(x).
Let's use CBOW and Skip-gram with matrices IN and OUT. IN is used with the linear activation function, OUT is used with softmax (full, not hierarchical).

3a) Compute the values of the hidden layer and the output layer of CBOW, when applied to w2 with context of one previous and one following word (N=2).
For simplicity, follow Figure 1, which says SUM (although AVG is used in practice).
Compute P(w2|w1,w3).
3b) Compute the values of the hidden layer and the output vector of Skip-gram with C=1, when applied to w2.
Compute P(w1|w2) and P(w3|w2).

Hint: softmax(a,b,c)=(exp(a)/Z, exp(b)/Z, exp(c)/Z), where Z=exp(a)+exp(b)+exp(c).
Hint: the values of IN and OUT are chosen so you can compute the answers just with pen and paper (without calculator/computer).
Hint: be careful about the different ordering of words in the vocabulary V and sentence s: V=(v1,v2,v3,v4)=(w2,w1,v3,w3).

4. One nice property of word vectors is that they are, in a sense, additive. Computing vector("Czech") + vector("currency") tends to give you something which is both Czech and a currency. However, the meaning of phrases like "heavy rain", "kick the bucket", "New York Times" or "Toronto Maple Leafs" is not compositional, i.e. it is not a combination of meanings of the individual words. How would you cope with that?
Bonus: check the Nearest words form at http://epsilon-it.utu.fi/wv_demo/ for multi-word expressions (MWE) like swan_song or red_herring. Try to find one more non-compositional MWE and one compositional MWE present in this demo's vocabulary. Guess how these MWEs were selected and/or suggest what would be your approach.

⚠️ **GitHub.com Fallback** ⚠️