NVIDIA SANA-WM Open World Model
Key Questions
What is NVIDIA SANA-WM?
SANA-WM is a 2.6B-parameter open-source world model developed by NVIDIA that generates controllable 720p videos up to one minute long from a single input image and a 6-DoF camera path.
How does SANA-WM produce video output?
It uses a Hybrid Linear Diffusion Transformer architecture built on the SANA-Video codebase and runs efficiently on a single GPU to create the video sequence.
Is SANA-WM available for public use?
Yes, the model is open-source and released through the NVlabs/Sana GitHub repository, allowing researchers and developers to access and build upon it.
What is Flash-GRPO in relation to SANA-WM?
Flash-GRPO is an efficient one-step policy optimization method that advances alignment techniques for video diffusion models, supporting the controllability features in SANA-WM.
How does SANA-WM relate to simulation and agent perception?
It echoes approaches like UniVidX Video-JEPA, positioning the model for use in simulation environments and improving agent perception through generated world models.
2.6B open-source world model generates controllable 1-minute 720p video from single image + camera path; echoes UniVidX Video-JEPA for simulation and agent perception. Incantation adds NL action interface for multi-entity video models.