Making video and image generation more controllable, efficient, and photorealistic
Smarter, Cinematic Generative Vision
This cluster tracks how generative vision models are evolving from pure image synthesis into controllable, reasoning-aware video systems. Papers introduce fine-grained control over motion and camera work for multi-shot, multi-subject video, real-time photorealism enhancement, and precise text- and glyph-guided image editing. Under the hood, new techniques like adaptive video tokenization, elastic diffusion interfaces, endogenous chain-of-thought in diffusion, and cross-layer sparse attention reuse push efficiency and reasoning quality. Together with broader coverage of AI video tools, these works point toward production-ready, cost-aware, and highly directable generative media pipelines.