2024년 9월 23일

Sep 23, 2024

RRM: Robust Reward Model Training Mitigates Reward Hacking

(Tianqi Liu, Wei Xiong, Jie Ren, Lichang Chen, Junru Wu, Rishabh Joshi, Yang Gao, Jiaming Shen, Zhen Qin, Tianhe Yu, Daniel Sohn, Anastasiia Makarova, Jeremiah Liu, Yuan Liu, Bilal Piot, Abe Ittycheriah, Aviral Kumar, Mohammad Saleh)

Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. However, traditional RM training, which relies on response pairs tied to specific prompts, struggles to disentangle prompt-driven preferences from prompt-independent artifacts, such as response length and format. In this work, we expose a fundamental limitation of current RM training methods, where RMs fail to effectively distinguish between contextual signals and irrelevant artifacts when determining preferences. To address this, we introduce a causal framework that learns preferences independent of these artifacts and propose a novel data augmentation technique designed to eliminate them. Extensive experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model (RRM). Our RRM improves the performance of a pairwise reward model trained on Gemma-2-9b-it, on RewardBench, increasing accuracy from 80.61% to 84.15%. Additionally, we train two DPO policies using both the RM and RRM, demonstrating that the RRM significantly enhances DPO-aligned policies, improving MT-Bench scores from 7.27 to 8.31 and length-controlled win-rates in AlpacaEval-2 from 33.46% to 52.49%.

Robust Reward Model을 위한 데이터 Augmentation 방법. 기본적인 아이디어는 프롬프트에 영향을 받지 않고 생성된 응답에만 영향을 받는 요인은 Preference에 영향을 미쳐서는 안 된다는 것입니다. 이를 위해 데이터셋 내에서 응답들을 섞어서 Negative나 Tie를 만들어 추가하는 방식으로 접근했네요. 데이터셋 내의 응답을 Negative로 설정해서 학습하는 방법은 이전에도 있었던 것 같긴 합니다.

#reward-model

Exploring Scaling Laws for Local SGD in Large Language Model Training

(Qiaozhi He, Xiaomin Zhuang, Zhihua Wu)

This paper investigates scaling laws for local SGD in LLM training, a distributed optimization algorithm that facilitates training on loosely connected devices. Through extensive experiments, we show that local SGD achieves competitive results compared to conventional methods, given equivalent model parameters, datasets, and computational resources. Furthermore, we explore the application of local SGD in various practical scenarios, including multi-cluster setups and edge computing environments. Our findings elucidate the necessary conditions for effective multi-cluster LLM training and examine the potential and limitations of leveraging edge computing resources in the LLM training process. This demonstrates its viability as an alternative to single large-cluster training.

Local SGD에 대한 Scaling Law. Local SGD로 인해 발생하는 수렴에서의 페널티를 고려하는 Scaling Law입니다. 그리고 이 페널티를 효율성에 대한 함수로 정의했네요.

Async Training을 Scaling에서 발생하는 문제에 대한 대안으로 연구하고 있다는 이야기들이 있긴 하더군요. 지금까지 그렇게 좋은 결과가 있었던 문제는 아니지만...붙잡고 계속 파다보면 뭔가 성과가 나올지도 모르겠네요. (성과가 나오는 즉시 영업 기밀 취급이 될 것 같긴 합니다만.)

#scaling-law #efficient-training

2024년 9월 23일

RRM: Robust Reward Model Training Mitigates Reward Hacking

Exploring Scaling Laws for Local SGD in Large Language Model Training

Discussion about this post