2024년 10월 21일

Oct 21, 2024

Decomposing The Dark Matter of Sparse Autoencoders

(Joshua Engels, Logan Riggs, Max Tegmark)

Sparse autoencoders (SAEs) are a promising technique for decomposing language model activations into interpretable linear features. However, current SAEs fall short of completely explaining model performance, resulting in "dark matter": unexplained variance in activations. This work investigates dark matter as an object of study in its own right. Surprisingly, we find that much of SAE dark matter--about half of the error vector itself and >90% of its norm--can be linearly predicted from the initial activation vector. Additionally, we find that the scaling behavior of SAE error norms at a per token level is remarkably predictable: larger SAEs mostly struggle to reconstruct the same contexts as smaller SAEs. We build on the linear representation hypothesis to propose models of activations that might lead to these observations, including postulating a new type of "introduced error"; these insights imply that the part of the SAE error vector that cannot be linearly predicted ("nonlinear" error) might be fundamentally different from the linearly predictable component. To validate this hypothesis, we empirically analyze nonlinear SAE error and show that 1) it contains fewer not yet learned features, 2) SAEs trained on it are quantitatively worse, 3) it helps predict SAE per-token scaling behavior, and 4) it is responsible for a proportional amount of the downstream increase in cross entropy loss when SAE activations are inserted into the model. Finally, we examine two methods to reduce nonlinear SAE error at a fixed sparsity: inference time gradient pursuit, which leads to a very slight decrease in nonlinear error, and linear transformations from earlier layer SAE outputs, which leads to a larger reduction.

Sparse Autoencoder의 Irreducible Error에 대한 분석. 흥미로운 것은 오차의 상당 부분의 분산이 Feature에 대한 Linear Projection으로 예측이 가능하다는 것이네요. 이런 Irreducible Error가 해석의 측면에서 갖는 의미가 무엇일까요.

Analysis of the irreducible error in Sparse Autoencoders. Interestingly, a significant portion of the error variance can be predicted through linear projection of features. What implications might these irreducible errors have for interpretation?

#mechanistic-interpretation

MoDification: Mixture of Depths Made Easy

(Chen Zhang, Meizhi Zhong, Qimeng Wang, Xuantao Lu, Zheyu Ye, Chengqiang Lu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang, Dawei Song)

Long-context efficiency has recently become a trending topic in serving large language models (LLMs). And mixture of depths (MoD) is proposed as a perfect fit to bring down both latency and memory. In this paper, however, we discover that MoD can barely transform existing LLMs without costly training over an extensive number of tokens. To enable the transformations from any LLMs to MoD ones, we showcase top-k operator in MoD should be promoted to threshold-p operator, and refinement to architecture and data should also be crafted along. All these designs form our method termed MoDification. Through a comprehensive set of experiments covering model scales from 3B to 70B, we exhibit MoDification strikes an excellent balance between efficiency and effectiveness. MoDification can achieve up to ~1.2x speedup in latency and ~1.8x reduction in memory compared to original LLMs especially in long-context applications.

Mixture of Depths에서 Top-K 대신 확률에 대한 Threshold를 사용하자는 아이디어. Mixture of Depths 원 논문에서 Autoregressive 세팅을 위해 Predictor를 사용한 것과 비슷할 것 같네요.

This paper proposes using a probability threshold instead of top-K for Mixture of Depths. It seems similar to the predictor used in the original Mixture of Depths paper for autoregressive settings.

#moe

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

(Shaozhe Hao, Xuantong Liu, Xianbiao Qi, Shihao Zhao, Bojia Zi, Rong Xiao, Kai Han, Kwan-Yee K. Wong)

We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. BiGR is the first conditional generative model that unifies generation and discrimination within the same framework. BiGR features a binary tokenizer, a masked modeling mechanism, and a binary transcoder for binary code prediction. Additionally, we introduce a novel entropy-ordered sampling method to enable efficient image generation. Extensive experiments validate BiGR's superior performance in generation quality, as measured by FID-50k, and representation capabilities, as evidenced by linear-probe accuracy. Moreover, BiGR showcases zero-shot generalization across various vision tasks, enabling applications such as image inpainting, outpainting, editing, interpolation, and enrichment, without the need for structural modifications. Our findings suggest that BiGR unifies generative and discriminative tasks effectively, paving the way for further advancements in the field.

Lookup-Free Quantization 같은 이진 양자화에 Masked Image Modeling을 결합한 실험. Autoregressive Model 혹은 이진 코드를 카테고리로 예측하는 방법들과의 비교가 있어 흥미롭습니다.

This is an interesting experiment that combines binary quantization, similar to lookup-free quantization, with masked image modeling. What makes it particularly intriguing are the comparisons with autoregressive models and methods that predict binary codes as categories.

#vq #image-generation

2024년 10월 21일

Decomposing The Dark Matter of Sparse Autoencoders

MoDification: Mixture of Depths Made Easy

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Discussion about this post