2025년 3월 20일

Mar 20

SkyLadder: Better and Faster Pretraining via Context Window Scheduling

5 Comments

arXiv Daily 덕에 매일 좋은 논문들 읽고 있습니다.

배치 사이즈 늘려가며 학습하는 예시나 Shortformer 학습 관련해서 참고할만한 논문이 또 있을까요?

Expand full comment

Expand full comment

이미 보셨겠지만, MinMax 0.1에서 "The power-law fit for the training loss and the critical batch size"도 계산해두었더군요!

Expand full comment

하이퍼파라미터에 대한 Scaling Law는 요즘 인기 있는 방법인데 이를 배치 크기 스케줄링과 연결했다는 건 재미있는 지점이네요.

Expand full comment

가장 mainstream 모델에서도 쓰고 있었군요 ㅎㅎ 감사합니다

Expand full comment