Questions BPE - ufal/NPFL095 GitHub Wiki
-
Why do we need subword units in NMT?
-
If you have the following vocabulary
{ 'w i n ' : 3, 'w i n n e r ' : 2, 'o l d e s t ': 6, 'w i d e s t ': 1 }
What will be the first two BPE merge operations? In case of multiple possible correct answers, list just one of them and explain.
-
What are the advantages and disadvantages of BPE (independently applied on the source and target language) versus joint BPE?
Bonus: Can you come up with some arguments not mentioned in the paper?
-
Section 5.1 says "For the 50 000 most frequent words, the representation is the same for all neural networks, and all neural networks achieve comparable unigram F1 for this category"
Do really all 50k most frequent German words (according to the training set frequency) have the same representation in BPE-Joint90k and WDict? Why? Can you prove it?
-
Bonus: In Figure 2, can you explain the drop in unigram F1 of BPE-J90k, C2-50k and C2-200/500k around rank 400k?