Nemotron 3 Super (free)

Chat Completions

nemotron-3-super-120b-a12b:free

Nvidia|Created Apr 9, 2026|262.1k context

Chat Completions

NVIDIA Nemotron 3 Super is an open hybrid MoE model with 120B parameters, using only 12B active parameters to achieve high computational efficiency and strong accuracy in complex multi-agent scenarios. Based on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it offers more than 50% faster token generation than leading open models. The model includes a 1M-token context window, enabling long-term agent consistency, cross-document reasoning, and multi-step task planning. Latent MoE makes it possible to engage 4 experts at the inference cost of just one, enhancing both intelligence and generalization. Reinforcement learning across more than 10 environments provides top-tier benchmark performance, including AIME 2025, TerminalBench, and SWE-Bench Verified. Released fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super supports simple customization and secure deployment in any environment — from local workstations to the cloud.

Overview Specifications Activity Performance Uptime Examples API Reference

Compare

Throughput

Time-To-First-Token (TTFT)