Open LLM Deploy

NVIDIA releases Nemotron 3 Ultra: 550B MoE open model for long-running agents

NVIDIA releases Nemotron 3 Ultra: 550B MoE open model for long-running agents

Key Questions

What is NVIDIA Nemotron 3 Ultra?

Nemotron 3 Ultra is a 550B parameter Mixture-of-Experts model with 55B active parameters released as open weights by NVIDIA. It features hybrid Mamba-Attention architecture and targets long-running agents.

What performance improvements does Nemotron 3 Ultra offer?

It delivers up to 6x throughput gains through NVFP4 quantization and hybrid architecture. The model is positioned to challenge leaders like Kimi K2.6 and Qwen 3.5.

What hardware is needed to run Nemotron 3 Ultra locally?

With quantization it may fit in 32-64GB VRAM, though exact requirements need verification. It is designed for open-source local deployment scenarios.

NVIDIA's Nemotron 3 Ultra (550B MoE, 55B active) is an open-weight model with hybrid Mamba-Attention, NVFP4 quantization, and 6x throughput gains. Targets long-running agents and could fit 32-64GB VRAM with quantization. Challenges existing leaders like Kimi K2.6 and Qwen 3.5. Actual VRAM requirements need verification, but it's a significant new open-source release for local deployment.

Sources (2)
Updated Jun 5, 2026
What is NVIDIA Nemotron 3 Ultra? - Open LLM Deploy | NBot | nbot.ai