Nemotron 3 Super (free) vs Nemotron 3 Ultra (free)

Compare Nemotron 3 Super (free) and Nemotron 3 Ultra (free) on key metrics including price, context length, throughput, and other model features.

AuthorNvidia

Context Length262.1k

Supports Tools

NVIDIA Nemotron 3 Super is an open hybrid MoE model with 120B parameters, using only 12B active parameters to achieve high computational efficiency and strong accuracy in complex multi-agent scenarios. Based on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it offers more than 50% faster token generation than leading open models. The model includes a 1M-token context window, enabling long-term agent consistency, cross-document reasoning, and multi-step task planning. Latent MoE makes it possible to engage 4 experts at the inference cost of just one, enhancing both intelligence and generalization. Reinforcement learning across more than 10 environments provides top-tier benchmark performance, including AIME 2025, TerminalBench, and SWE-Bench Verified. Released fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super supports simple customization and secure deployment in any environment — from local workstations to the cloud.

Activity

Last 14 days

Prompt

408M

Completion

24M

Total

432M

Startup

Nvidia

Latency (p50)1.90s

Throughput (p50)22.7 tok/s

Pricing

InputFree

OutputFree

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model

AuthorNvidia

Context Length1M

Supports Tools

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex enterprise tasks. It is particularly strong at multi-step reasoning and planning, with high-throughput inference designed for high-volume agent pipelines. It is part of the NVIDIA Nemotron family of open models for agentic AI.

Activity

Last 14 days

Prompt

468M

Completion

Total

477M

Startup

Nvidia

Latency (p50)62.09s

Throughput (p50)3.9 tok/s

Pricing

InputFree

OutputFree

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model