Questions wav2vec - ufal/NPFL095 GitHub Wiki

Why is the quantization module used in the paper? What is the purpose of codebooks?
What is the difference between softmax and Gumbel softmax? Why is the Gumbel softmax used here?
What kind of output qualities are observed by Contrastive loss? And what is the functionality of the Diversity loss?
Find at least three examples of a set of classes into which the output of pre-trained models can be divided during fine-tuning.
What is the difference between LARGE and BASE models described in the paper? Which one is a better choice to use and why?