NVIDIA Nemotron 3 Ultra open MoE hybrid Mamba-Transformer for agentic reasoning
Key Questions
What architecture does Nemotron 3 Ultra use?
It is a 550B MoE model with 55B active parameters that combines a hybrid Mamba-Transformer architecture. It also incorporates NVFP4 quantization and Multi-Teacher On-Policy Distillation.
What efficiency gains does Nemotron 3 Ultra offer?
Benchmarks show 5x throughput and up to 30% cost savings compared to peer models. It is optimized specifically for long-running agentic reasoning tasks.
Are the weights for Nemotron 3 Ultra publicly available?
Yes, open weights have been released and the model is available on platforms such as Amazon SageMaker JumpStart and Hugging Face.
What context length does Nemotron 3 Ultra support?
The model supports a 1M token context window, making it suitable for extended agentic and reasoning workloads.
Who released Nemotron 3 Ultra and for what use cases?
NVIDIA released the model to advance open-weight AI with strong capability-to-efficiency ratios, targeting agentic reasoning and production deployments.
NVIDIA released Nemotron 3 Ultra, a 550B MoE with 55B active params optimized for long-running agents. Hybrid Mamba-Transformer architecture, NVFP4 quantization, and Multi-Teacher On-Policy Distillation. Benchmarks show 5x throughput and 30% cost savings vs peers. Open weights available. Now available on Amazon SageMaker JumpStart for one-click deployment. Sebastian Raschka highlights its efficiency ratio and agentic post-training. Fits perfectly with focus on open-source model releases and agentic AI.