2024년 4월 26일

Kim Seonghyeon

Apr 26, 2024

Snowflake에서 Arctic이라는 17B Active / 480B Total Paramter, Top-2 128 Expert, 3.5T 학습한 MoE 모델을 공개했습니다.

https://www.snowflake.com/en/data-cloud/arctic/cookbook/

모델 자체보다도 코드와 학습 방법 및 레시피를 공개할 예정이라는 것이 흥미롭네요. 기다려보면 재미있지 않을까 싶습니다.

#llm

Weak-to-Strong Extrapolation Expedites Alignment

(Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng)

Although the capabilities of large language models (LLMs) ideally scale up with increasing data and compute, they are inevitably constrained by limited resources in reality. Suppose we have a moderately trained LLM (e.g., trained to align with human preference) in hand, can we further exploit its potential and cheaply acquire a stronger model? In this paper, we propose a simple method called ExPO to boost LLMs' alignment with human preference. ExPO assumes that a medium-aligned model can be interpolated between a less-aligned (weaker) model, e.g., the initial SFT model, and a better-aligned (stronger) one, thereby directly obtaining this stronger model by extrapolating from the weights of the former two relatively weaker models. On the AlpacaEval 2.0 benchmark, we show that ExPO pushes models trained with less preference data (e.g., 10% or 20%) to reach and even surpass the fully-trained one, without any additional training. Furthermore, ExPO also significantly improves off-the-shelf DPO/RLHF models and exhibits decent scalability across model sizes from 7B to 70B. Our work demonstrates the efficacy of model extrapolation in exploiting LLMs' capabilities, suggesting a promising direction that deserves future exploration.

더 정렬된 모델과 덜 정렬된 모델의 선형 결합으로 중간 정렬 모델을 만들 수 있으니 중간 정렬된 모델과 덜 정렬된 모델의 차이의 반대 방향으로 외삽하면 더 정렬된 모델이 나오지 않을까 하는 아이디어. 재미있네요.

#alignment #model-merge

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

(Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale)

Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. PRISM contributes (i) wide geographic and demographic participation in human feedback data; (ii) two census-representative samples for understanding collective welfare (UK and US); and (iii) individualised feedback where every rating is linked to a detailed participant profile, thus permitting exploration of personalisation and attribution of sample artefacts. We focus on collecting conversations that centre subjective and multicultural perspectives on value-laden and controversial topics, where we expect the most interpersonal and cross-cultural disagreement. We demonstrate the usefulness of PRISM via three case studies of dialogue diversity, preference diversity, and welfare outcomes, showing that it matters which humans set alignment norms. As well as offering a rich community resource, we advocate for broader participation in AI development and a more inclusive approach to technology design.

여러 문화권의 다양한 인구학적 특징을 가진 참여자들이 LLM과 대화하고 피드백을 작성한 데이터. 사람의 선호를 주입한다면 선호되는 것이 무엇인지를 알아야겠죠. 분석해보면 재미있을 듯 합니다.

#alignment

2024년 4월 26일

Snowflake Arctic

Weak-to-Strong Extrapolation Expedites Alignment

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Discussion about this post