Questions Let’s Verify Step by Step - ufal/NPFL095 GitHub Wiki
-
What is the difference between outcome supervision and process supervision? Which one has a higher computational cost?
-
What is the MATH dataset and why is it challenging for language models? Why authors choose it to explore the model's reasoning abilities?
-
Based on information in Appendix G and the main part of the paper, suggest an explanation of why one of the previous works utilizing school-level math got similar performance results for PRM and ORM approaches.
-
How did the authors handle ambiguous responses in this study?
-
What is active learning? In which way did the active learning technique improve efficiency in this study?
Bonus: Is it possible to improve active learning utilization in this paper, especially for the larger number of problems per problem?