Questions BLEURT - ufal/NPFL095 GitHub Wiki
-
What function does BLEURT learn? Explain its components.
-
Why do traditional overlap-based metrics not always correlate well with human judgments, and how does BLEURT address these limitations?
-
What is meant by ‘quality drift’ mentioned in the paper? Why is it a problem for learned evaluation metrics?
-
How does the synthetic pre-training scheme try to anticipate certain errors produced by text generation systems?