MoE & video/3D efficiency scaling
Key Questions
How does the MoE Scaling Recipe guide expert count and granularity?
Performance scales with total parameters while optimal granularity focuses on active parameters. Dropless routing further improves efficiency and training stability in large MoE models.
What efficiency gains does LongLive-2.0 provide for long video generation?
LongLive-2.0 uses 4-bit precision and parallelism to generate extended videos with reduced memory and compute. It enables practical scaling for high-resolution, long-horizon video synthesis.
How does VGGT-Omega reduce memory usage in 3D reconstruction?
VGGT-Omega achieves up to 70% lower memory consumption through optimized architectures for 3D tasks. This makes large-scale reconstruction more accessible without sacrificing quality.
MoE Scaling Recipe: performance scales with total params, optimal granularity on active params, dropless routing. LongLive-2.0 NVFP4 for long video; VGGT-Omega 70% lower memory for 3D.