Questions BPE Dropout - ufal/NPFL095 GitHub Wiki

What is the main disadvantage of BPE which BPE-Dropout tries to solve?
Bonus: In the Charagrams approach each word is represented by its character n-grams. E.g. when using 4-grams up to 5-grams, word "unrelated" is represented as <w>unr, unre, nrel, rela, elat, late, ated, ted</w>, <w>unre, unrel, nrela, relat, elate, lated, ated</w> (actually just a subset of n-grams which are frequent in the training data). Guess what are the (dis)advantages of BPE-Dropout vs. Charagrams.