Questions BLEURT - ufal/NPFL095 GitHub Wiki

  1. What function does BLEURT learn? Explain its components.

  2. Why do traditional overlap-based metrics not always correlate well with human judgments, and how does BLEURT address these limitations?

  3. What is meant by ‘quality drift’ mentioned in the paper? Why is it a problem for learned evaluation metrics?

  4. How does the synthetic pre-training scheme try to anticipate certain errors produced by text generation systems?