NVIDIA Nemotron 3 Ultra open MoE hybrid Mamba-Transformer for agentic reasoning

Key Questions

What architecture does Nemotron 3 Ultra use?

It is a 550B MoE model with 55B active parameters that combines a hybrid Mamba-Transformer architecture. It also incorporates NVFP4 quantization and Multi-Teacher On-Policy Distillation.

What efficiency gains does Nemotron 3 Ultra offer?

Benchmarks show 5x throughput and up to 30% cost savings compared to peer models. It is optimized specifically for long-running agentic reasoning tasks.

Are the weights for Nemotron 3 Ultra publicly available?

Yes, open weights have been released and the model is available on platforms such as Amazon SageMaker JumpStart and Hugging Face.

What context length does Nemotron 3 Ultra support?

The model supports a 1M token context window, making it suitable for extended agentic and reasoning workloads.

Who released Nemotron 3 Ultra and for what use cases?

NVIDIA released the model to advance open-weight AI with strong capability-to-efficiency ratios, targeting agentic reasoning and production deployments.

NVIDIA released Nemotron 3 Ultra, a 550B MoE with 55B active params optimized for long-running agents. Hybrid Mamba-Transformer architecture, NVFP4 quantization, and Multi-Teacher On-Policy Distillation. Benchmarks show 5x throughput and 30% cost savings vs peers. Open weights available. Now available on Amazon SageMaker JumpStart for one-click deployment. Sebastian Raschka highlights its efficiency ratio and agentic post-training. Fits perfectly with focus on open-source model releases and agentic AI.

Sources (5)