SANA-WM & TRELLIS.2 multimodal vision
Key Questions
What is SANA-WM and its main capability?
SANA-WM is a 2.6B open-source world model that generates minute-scale 720p video from a single image and camera path. It runs on one GPU such as RTX 5090. GDN attention and dual camera branches improve controllability.
How does SANA-WM perform on consumer hardware?
It produces one-minute 720p videos using a single consumer GPU. The model is designed for accessible world simulation without data-center resources. A GitHub repo provides code and weights.
What is TRELLIS.2?
TRELLIS.2 is a 4B open image-to-3D model using O-Voxel representation. Weights are released on Hugging Face for community use. It advances open multimodal vision generation.
Where is the SANA-WM code available?
The official GitHub repository at NVlabs/Sana contains the model and generation scripts. It supports controllable video synthesis from images. Community ports enable quick RTX 5090 testing.
What makes SANA-WM notable compared to prior video models?
It achieves minute-long high-resolution output at low compute cost. The open release lowers barriers for world-model research. Dual-branch attention enhances camera and scene consistency.
NVIDIA SANA-WM 2.6B open world model: minute-scale 720p video on single GPU (RTX 5090), GDN attention, dual camera branches, GitHub repo. TRELLIS.2 4B open image-to-3D O-Voxel HF weights.