2025년 3월 19일

Mar 19, 2025

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

(Jensen (Jinghao) Zhou, Hang Gao, Vikram Voleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, Varun Jampani)

We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene, given any number of input views and target cameras. Existing works struggle to generate either large viewpoint changes or temporally smooth samples, while relying on specific task configurations. Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy that generalize across view synthesis tasks at test time. As a result, our samples maintain high consistency without requiring additional 3D representation-based distillation, thus streamlining view synthesis in the wild. Furthermore, we show that our method can generate high-quality videos lasting up to half a minute with seamless loop closure. Extensive benchmarking demonstrates that Seva outperforms existing methods across different datasets and settings.

Diffusion 기반 Novel View Synthesis. M개 이미지와 카메라 포즈 정보를 통해 N개 이미지를 생성하는 방식이군요. 샘플링 과정에서 3D Consistency가 문제인데 앵커 프레임을 경유하는 방법을 사용했습니다.

A novel view synthesis method based on diffusion. It generates N images from M input images and camera poses. The main challenge is maintaining 3D consistency during the sampling process, which the authors address by utilizing anchor frames as intermediaries for image generation.

#diffusion #novel-view-synthesis

RWKV-7 "Goose" with Expressive Dynamic State Evolution

(Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Haowen Hou, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng)

We present RWKV-7 "Goose", a new sequence modeling architecture, along with pre-trained language models that establish a new state-of-the-art in downstream performance at the 3 billion parameter scale on multilingual tasks, and match current SoTA English language performance despite being trained on dramatically fewer tokens than other top 3B models. Nevertheless, RWKV-7 models require only constant memory usage and constant inference time per token. RWKV-7 introduces a newly generalized formulation of the delta rule with vector-valued gating and in-context learning rates, as well as a relaxed value replacement rule. We show that RWKV-7 can perform state tracking and recognize all regular languages, while retaining parallelizability of training. This exceeds the capabilities of Transformers under standard complexity conjectures, which are limited to TC^0. To demonstrate RWKV-7's language modeling capability, we also present an extended open source 3.1 trillion token multilingual corpus, and train four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on this dataset. To foster openness, reproduction, and adoption, we release our models and dataset component listing at https://huggingface.co/RWKV, and our training and inference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0 License.

RWKV-7은 DeltaNet / Test Time Training 계통의 방법이군요. RWKV 저자의 집념은 정말 굉장하네요.

RWKV-7 appears to be a method derived from the DeltaNet / Test Time Training. The persistence of the RWKV author is truly remarkable.

#state-space-model

Optimizing ML Training with Metagradient Descent

(Logan Engstrom, Andrew Ilyas, Benjamin Chen, Axel Feldmann, William Moses, Aleksander Madry)

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.

모델 학습 과정에 대한 그래디언트인 Metagradient로 데이터 선택 같은 하이퍼파라미터 최적화를 시도. 이 계통 아이디어는 정말 오랜만이네요.

This paper attempts to optimize hyperparameters, such as data selection, using metagradients, which are gradients of the model training process. It's been a long time since I've seen ideas of this kind.

#hyperparameter

2025년 3월 19일

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Optimizing ML Training with Metagradient Descent

Discussion about this post