Questions NMT Attention - ufal/NPFL095 GitHub Wiki

Bahdanau et al.: Neural Machine Translation by Jointly Learning to Align and Translate, 2014

Reading the appendices A,B,C is not required. The main idea of the paper (described in Section 3.1) is now called "attention" or "attention mechanism" (or even "Bahdanau-style attention" if referring to this particular implementation).

Questions

What are the limitations of the encoder-decoder approach to machine translation?
What are the benefits of using a bidirectional RNN in machine translation? Do you think Bi-RNN helps even without the attention?
Let's translate a three-word sentence as in the paper, but with a simplified alignment model a(s_i-1, h_j) = s_i-1 · h_j (i.e. using a dot product instead of a feedforward neural network).
Let the forward hidden states be (0.1, 0.2), (-0.3, 0.4) and (0.5, 0)
and the backward hidden states be (0.2, 0.2), (0.5, -0.3) and (-0.1, 0.5).
Suppose that after translating the first word, we have s₁ = (-2, 1, 1, 1).
Compute c₂.
What are the benefits of soft-alignment?

Additional (optional) reading

Introduction to LSTM (and GRU): http://colah.github.io/posts/2015-08-Understanding-LSTMs/
More tricks in attention-based NMT (e.g. Luong-style attention): https://arxiv.org/pdf/1508.04025.pdf

Questions NMT Attention - ufal/NPFL095 GitHub Wiki

Questions

Additional (optional) reading

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️