MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
Share this post
2025년 5월 19일
Share this post
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production