Compare GLM 4.5 Air (free) and Nemotron 3 Super (free) on key metrics including price, context length, throughput, and other model features.
GLM-4.5-Air is the lightweight version of our newest flagship model family, designed specifically for agent-focused applications. Like GLM-4.5, it uses a Mixture-of-Experts (MoE) architecture, but with a smaller parameter footprint. GLM-4.5-Air also supports hybrid inference modes, including a "thinking mode" for deeper reasoning and tool usage, and a "non-thinking mode" for real-time interactions.
NVIDIA Nemotron 3 Super is an open hybrid MoE model with 120B parameters, using only 12B active parameters to achieve high computational efficiency and strong accuracy in complex multi-agent scenarios. Based on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it offers more than 50% faster token generation than leading open models. The model includes a 1M-token context window, enabling long-term agent consistency, cross-document reasoning, and multi-step task planning. Latent MoE makes it possible to engage 4 experts at the inference cost of just one, enhancing both intelligence and generalization. Reinforcement learning across more than 10 environments provides top-tier benchmark performance, including AIME 2025, TerminalBench, and SWE-Bench Verified. Released fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super supports simple customization and secure deployment in any environment — from local workstations to the cloud.