AI Model Release Tracker

SANA-WM & TRELLIS.2 multimodal vision

SANA-WM & TRELLIS.2 multimodal vision

Key Questions

What is SANA-WM and its main capability?

SANA-WM is a 2.6B open-source world model that generates minute-scale 720p video from a single image and camera path. It runs on one GPU such as RTX 5090. GDN attention and dual camera branches improve controllability.

How does SANA-WM perform on consumer hardware?

It produces one-minute 720p videos using a single consumer GPU. The model is designed for accessible world simulation without data-center resources. A GitHub repo provides code and weights.

What is TRELLIS.2?

TRELLIS.2 is a 4B open image-to-3D model using O-Voxel representation. Weights are released on Hugging Face for community use. It advances open multimodal vision generation.

Where is the SANA-WM code available?

The official GitHub repository at NVlabs/Sana contains the model and generation scripts. It supports controllable video synthesis from images. Community ports enable quick RTX 5090 testing.

What makes SANA-WM notable compared to prior video models?

It achieves minute-long high-resolution output at low compute cost. The open release lowers barriers for world-model research. Dual-branch attention enhances camera and scene consistency.

NVIDIA SANA-WM 2.6B open world model: minute-scale 720p video on single GPU (RTX 5090), GDN attention, dual camera branches, GitHub repo. TRELLIS.2 4B open image-to-3D O-Voxel HF weights.

Sources (5)
Updated May 16, 2026
What is SANA-WM and its main capability? - AI Model Release Tracker | NBot | nbot.ai