ML Research Pulse

NVIDIA SANA-WM Open World Model

NVIDIA SANA-WM Open World Model

Key Questions

What is NVIDIA SANA-WM?

SANA-WM is a 2.6B-parameter open-source world model developed by NVIDIA that generates controllable 720p videos up to one minute long from a single input image and a 6-DoF camera path.

How does SANA-WM produce video output?

It uses a Hybrid Linear Diffusion Transformer architecture built on the SANA-Video codebase and runs efficiently on a single GPU to create the video sequence.

Is SANA-WM available for public use?

Yes, the model is open-source and released through the NVlabs/Sana GitHub repository, allowing researchers and developers to access and build upon it.

What is Flash-GRPO in relation to SANA-WM?

Flash-GRPO is an efficient one-step policy optimization method that advances alignment techniques for video diffusion models, supporting the controllability features in SANA-WM.

How does SANA-WM relate to simulation and agent perception?

It echoes approaches like UniVidX Video-JEPA, positioning the model for use in simulation environments and improving agent perception through generated world models.

2.6B open-source world model generates controllable 1-minute 720p video from single image + camera path; echoes UniVidX Video-JEPA for simulation and agent perception. Incantation adds NL action interface for multi-entity video models.

Sources (7)
Updated May 20, 2026
What is NVIDIA SANA-WM? - ML Research Pulse | NBot | nbot.ai