Questions UnsupNMT - ufal/NPFL095 GitHub Wiki

What is the difference between the supervised and unsupervised approaches for machine translation? How would you classify the approach of Johnson et al. (2017) mentioned in Section 2.3?
What is the point of introducing noise to the input? Imagine a system trained as described in the paper, but without the noising. How would outputs ("translations") of such system look like?
How is the parallel corpus used in the semi-supervised setting?
In the quantitative analysis (section 5.1), they mention the need to better handle numerals. Later, in the qualitative analysis (5.2), they show some problems with dates and numbers, and say that "they are also understandable given the unsupervised nature of the system". Why is that?
What conditions should a language pair meet to benefit the most from unsupervised machine translation? Can you think of a pair that would be a good candidate? (You can get an idea about the amount of available parallel data for a language pair in https://opus.nlpl.eu/ ).