Questions Let’s Verify Step by Step - ufal/NPFL095 GitHub Wiki

  1. What is the difference between outcome supervision and process supervision? Which one has a higher computational cost?

  2. What is the MATH dataset and why is it challenging for language models? Why authors choose it to explore the model's reasoning abilities?

  3. Based on information in Appendix G and the main part of the paper, suggest an explanation of why one of the previous works utilizing school-level math got similar performance results for PRM and ORM approaches.

  4. How did the authors handle ambiguous responses in this study?

  5. What is active learning? In which way did the active learning technique improve efficiency in this study?

Bonus: Is it possible to improve active learning utilization in this paper, especially for the larger number of problems per problem?