2024년 12월 24일
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization
(Vivek Ramanujan, Kushal Tirumala, Armen Aghajanyan, Luke Zettlemoyer, Ali Farhadi)
Current image generation methods, such as latent diffusion and discrete token-based generation, depend on a two-stage training approach. In stage 1, an auto-encoder is trained to compress an image into a latent space; in stage 2, a generative model is trained to learn a distribution over that latent space. Most work focuses on maximizing stage 1 performance independent of stage 2, assuming better reconstruction always leads to better generation. However, we show this is not strictly true. Smaller stage 2 models can benefit from more compressed stage 1 latents even if reconstruction performance worsens, showing a fundamental trade-off between compression and generation modeling capacity. To better optimize this trade-off, we introduce Causally Regularized Tokenization (CRT), which uses knowledge of the stage 2 generation modeling procedure to embed useful inductive biases in stage 1 latents. This regularization makes stage 1 reconstruction performance worse, but makes stage 2 generation performance better by making the tokens easier to model: we are able to improve compute efficiency 2-3$\times$ over baseline and match state-of-the-art discrete autoregressive ImageNet generation (2.18 FID) with less than half the tokens per image (256 vs. 576) and a fourth the total model parameters (775M vs. 3.1B) as the previous SOTA (LlamaGen).
VQ와 Autoregressive 모델로 이루어진 이미지 생성 파이프라인에서 1단계 VQ의 성능이 높은 것이 더 나은 생성 결과로 이어지지 않는다는 문제에 대한 연구. Scaling Law를 기반으로 문제를 분석한 다음 VQ 단계에서 Autoregressive 모델을 붙여 Loss를 추가하는 형태로 이 트레이드오프에 대한 최적 지점을 찾으려고 시도했네요.
LLM 토크나이저도 그렇지만 Autoregressive 학습과는 별개로 학습되는 컴포넌트이기 때문에 문제가 발생할 수밖에 없죠. 이에 대한 한 가지 대응 방안은 최적 지점을 적절하게 선택하는 것인데, 늘 그렇지만 그런 문제에 대해서는 Scaling Law가 꾸준히 효과적이네요.
This research addresses the problem where better performance in the VQ stage of an image generation pipeline consisting of VQ and an autoregressive model doesn't necessarily lead to better generation results. The researchers analyzed the problem using scaling laws and attempted to find an optimal point for this trade-off by adding a loss term in the VQ stage using an autoregressive model.
Like LLM tokenizers, VQ is a component trained separately from the autoregressive model, which inevitably leads to this issue. One approach to address this problem is to find an optimal point for the trade-off, and as is often the case, scaling laws prove to be consistently effective for these kinds of problems.
#vq #autoregressive-model #scaling-law
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
(Weihao Zeng, Yuzhen Huang, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He)
In the absence of extensive human-annotated data for complex reasoning tasks, self-improvement -- where models are trained on their own outputs -- has emerged as a primary method for enhancing performance. However, the critical factors underlying the mechanism of these iterative self-improving methods remain poorly understood, such as under what conditions self-improvement is effective, and what are the bottlenecks in the current iterations. In this work, we identify and propose methods to monitor two pivotal factors in this iterative process: (1) the model's ability to generate sufficiently diverse responses (exploration); and (2) the effectiveness of external rewards in distinguishing high-quality candidates from lower-quality ones (exploitation). Using mathematical reasoning as a case study, we begin with a quantitative analysis to track the dynamics of exploration and exploitation, discovering that a model's exploratory capabilities rapidly deteriorate over iterations, and the effectiveness of exploiting external rewards diminishes as well. Motivated by these findings, we introduce B-STaR, a Self-Taught Reasoning framework that autonomously adjusts configurations across iterations to Balance exploration and exploitation, thereby optimizing the self-improving effectiveness based on the current policy model and available rewards. Our experiments on mathematical reasoning, coding, and commonsense reasoning demonstrate that B-STaR not only enhances the model's exploratory capabilities throughout training but also achieves a more effective balance between exploration and exploitation, leading to superior performance.
Self Improvement 과정에서 다양한 샘플이 생성되는가, 그리고 좋은 샘플이 선택되는가의 균형 문제에 대한 탐구. Self Improvement 과정에서 다양성이 빠르게 감소한다는 것은 자주 제기되는 문제죠. (https://arxiv.org/abs/2412.02674) 이에 대해 다양성에 대한 지표를 만든 다음 그에 따라 샘플링 Temperature와 Reward Score Threshold를 설정하는 방법을 고안했네요.
여담이지만 Pass@32-1과 Reward@32-1이 다른 이유를 잘 모르겠네요.
This study explores the balance between generating diverse samples and selecting good samples during the self-improvement process. It's a commonly raised issue that diversity decreases rapidly during self-improvement (https://arxiv.org/abs/2412.02674). To address this, the authors devised a method that first creates a metric for diversity and then adjusts sampling temperature and reward score threshold accordingly.
As a side note, I'm not sure why Pass@32-1 and Reward@32-1 are different.
#search #rl #self-improvment