2025년 3월 31일
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
(Wei Shen, Guanlin Liu, Zheng Wu, Ruofei Zhu, Qingping Yang, Chao Xin, Yu Yue, Lin Yan)
Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning large language models with human preferences. While recent research has focused on algorithmic improvements, the importance of prompt-data construction has been overlooked. This paper addresses this gap by exploring data-driven bottlenecks in RLHF performance scaling, particularly reward hacking and decreasing response diversity. We introduce a hybrid reward system combining reasoning task verifiers (RTV) and a generative reward model (GenRM) to mitigate reward hacking. We also propose a novel prompt-selection method, Pre-PPO, to maintain response diversity and enhance learning effectiveness. Additionally, we find that prioritizing mathematical and coding tasks early in RLHF training significantly improves performance. Experiments across two model sizes validate our methods' effectiveness and scalability. Results show that RTV is most resistant to reward hacking, followed by GenRM with ground truth, and then GenRM with SFT Best-of-N responses. Our strategies enable rapid capture of subtle task-specific distinctions, leading to substantial improvements in overall RLHF performance. This work highlights the importance of careful data construction and provides practical methods to overcome performance barriers in RLHF.
RLHF 과정의 Reward Hacking과 다양성 감소에 대한 대응. Verifiable Reward와 GT를 사용하는 Reward Model을 추가로 사용, Reward Score가 낮은 프롬프트를 선택, 그리고 수학과 코딩에 대해서 먼저 학습시키는 전략을 사용했군요.
This paper addresses reward hacking and diversity reduction issues in the RLHF process. The strategies employed include using additional verifiable rewards and a reward model that utilizes ground truth, selecting prompts with low reward scores, and starts training on mathematics and coding tasks in the early stages.
#rlhf #reward-model
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
(Belinda Z. Li, Been Kim, Zi Wang)
Recently, a large amount of work has focused on improving large language models' (LLMs') performance on reasoning benchmarks such as math and logic. However, past work has largely assumed that tasks are well-defined. In the real world, queries to LLMs are often underspecified, only solvable through acquiring missing information. We formalize this as a constraint satisfaction problem (CSP) with missing variable assignments. Using a special case of this formalism where only one necessary variable assignment is missing, we can rigorously evaluate an LLM's ability to identify the minimal necessary question to ask and quantify axes of difficulty levels for each problem. We present QuestBench, a set of underspecified reasoning tasks solvable by asking at most one question, which includes: (1) Logic-Q: Logical reasoning tasks with one missing proposition, (2) Planning-Q: PDDL planning problems with initial states that are partially-observed, (3) GSM-Q: Human-annotated grade school math problems with one missing variable assignment, and (4) GSME-Q: a version of GSM-Q where word problems are translated into equations by human annotators. The LLM is tasked with selecting the correct clarification question(s) from a list of options. While state-of-the-art models excel at GSM-Q and GSME-Q, their accuracy is only 40-50% on Logic-Q and Planning-Q. Analysis demonstrates that the ability to solve well-specified reasoning problems may not be sufficient for success on our benchmark: models have difficulty identifying the right question to ask, even when they can solve the fully specified version of the problem. Furthermore, in the Planning-Q domain, LLMs tend not to hedge, even when explicitly presented with the option to predict "not sure." This highlights the need for deeper investigation into models' information acquisition capabilities.
문제를 푸는 데 필요한 정보가 무엇인지 질문할 수 있는가를 평가하는 벤치마크. 자연스러운 질문은 문제를 푸는 능력과 질문하는 능력이 어떻게 연관되는가일 텐데 이 둘이 바로 연관되지 않는다는 이야기를 합니다.
This paper presents a benchmark for evaluating whether a model can ask questions to obtain information necessary for solving problems. A natural question arising from this is how problem-solving ability relates to the ability to ask questions. The authors suggest that these two abilities are not directly correlated.
#reasoning #benchmark