Questions NMT Attention - ufal/NPFL095 GitHub Wiki
Bahdanau et al.: Neural Machine Translation by Jointly Learning to Align and Translate, 2014
Reading the appendices A,B,C is not required. The main idea of the paper (described in Section 3.1) is now called "attention" or "attention mechanism" (or even "Bahdanau-style attention" if referring to this particular implementation).
- 
What are the limitations of the encoder-decoder approach to machine translation? 
- 
What are the benefits of using a bidirectional RNN in machine translation? Do you think Bi-RNN helps even without the attention? 
- 
Let's translate a three-word sentence as in the paper, but with a simplified alignment model a(si-1, hj) = si-1 · hj (i.e. using a dot product instead of a feedforward neural network). 
 Let the forward hidden states be (0.1, 0.2), (-0.3, 0.4) and (0.5, 0)
 and the backward hidden states be (0.2, 0.2), (0.5, -0.3) and (-0.1, 0.5).
 Suppose that after translating the first word, we have s1 = (-2, 1, 1, 1).
 Compute c2.
- 
What are the benefits of soft-alignment? 
- Introduction to LSTM (and GRU): http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- More tricks in attention-based NMT (e.g. Luong-style attention): https://arxiv.org/pdf/1508.04025.pdf