Generative Model Architectures and Post-Training Advances

Key Questions

What new generative models were announced?

Models include DreamX-World 1.0, GLM-5.2 with 1M context, VibeThinker-3B, MIRA (5B params at 20fps), and LingBot-World 2.0. NVIDIA Nemotron 3 Ultra and Gemma 4 12B also launched.

What advances occurred in flow matching and diffusion?

Flow Sampling, MrFlow for 10x diffusion acceleration, and Multi-Resolution Flow Matching enable training-free acceleration. Flash-BoN supports inference-time scaling in diffusion models.

How is MoE applied to video and embodied intelligence?

MoE routing via manifold power iteration and DiT-based MoE video pretraining scale embodied intelligence. Scaling Mixture-of-Experts Video Pretraining supports this direction.

What is MIRA and its capabilities?

MIRA is a multiplayer interactive world model trained on Rocket League with 5B parameters running at 20fps on a single GPU. It advances interactive world modeling.

What memory and architecture improvements were introduced?

HOLA hybrid memory, Jet-Long for long-context extension, and Linear Attention Architectures address efficiency. AdaJEPA enables continual world model learning.

What post-training methods optimize visual generative models?

Optimizing Visual Generative Models via Distribution-wise Rewards and From SRA to Self-Flow improve training. Denser neq Better examines limits of on-policy self-distillation.

What interactive video generation models appeared?

Vidu S1 provides real-time interactive video generation. ARDY supports autoregressive diffusion for interactive human motion generation.

What long-horizon video techniques were developed?

LongE2V handles long-horizon event-based video reconstruction and prediction with diffusion models. These extend generative capabilities for complex sequences.

New models and methods: DreamX-World 1.0, GLM-5.2 (1M context), VibeThinker-3B, Kairos, UniAR, FPRM, WEAVER, NVIDIA Nemotron 3 Ultra, Gemma 4 12B, MoE routing via manifold power iteration. New today: MIRA (multiplayer world model, 5B params, 20fps on single GPU), LingBot-World 2.0 (infinite interactive worlds, 14B/1.3B, 720p 60fps), MoE video pretraining for embodied intelligence (DiT-based, open-source), Flow Sampling (conditioning on source for flow matching), MrFlow (10x diffusion acceleration), HOLA hybrid memory, Optimizing Visual Generative Models, Denser neq Better, From SRA to Self-Flow. Also AdaJEPA, CIFAR AI Chair (RWKV-7 for quantum optimization).

Sources (17)