GLM 4.5 Air (free) vs Nemotron 3 Super (free)

Compare GLM 4.5 Air (free) and Nemotron 3 Super (free) on key metrics including price, context length, throughput, and other model features.

AuthorZ.ai

Context Length131.1k

Supports Tools

GLM-4.5-Air is the lightweight version of our newest flagship model family, designed specifically for agent-focused applications. Like GLM-4.5, it uses a Mixture-of-Experts (MoE) architecture, but with a smaller parameter footprint. GLM-4.5-Air also supports hybrid inference modes, including a "thinking mode" for deeper reasoning and tool usage, and a "non-thinking mode" for real-time interactions.

Activity

Last 14 days

Prompt

544M

Completion

33M

Total

577M

Startup

Z.ai

Latency (p50)3.29s

Throughput (p50)24.8 tok/s

Pricing

InputFree

OutputFree

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model

AuthorNvidia

Context Length262.1k

Supports Tools

NVIDIA Nemotron 3 Super is an open hybrid MoE model with 120B parameters, using only 12B active parameters to achieve high computational efficiency and strong accuracy in complex multi-agent scenarios. Based on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it offers more than 50% faster token generation than leading open models. The model includes a 1M-token context window, enabling long-term agent consistency, cross-document reasoning, and multi-step task planning. Latent MoE makes it possible to engage 4 experts at the inference cost of just one, enhancing both intelligence and generalization. Reinforcement learning across more than 10 environments provides top-tier benchmark performance, including AIME 2025, TerminalBench, and SWE-Bench Verified. Released fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super supports simple customization and secure deployment in any environment — from local workstations to the cloud.

Activity

Last 14 days

Prompt

167M

Completion

10M

Total

177M

Startup

Nvidia

Latency (p50)4.18s

Throughput (p50)20.7 tok/s

Pricing

InputFree

OutputFree

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model