GLM 4.5 Air (free) vs Nemotron 3 Ultra (free)

Compare GLM 4.5 Air (free) and Nemotron 3 Ultra (free) on key metrics including price, context length, throughput, and other model features.

AuthorZ.ai

Context Length131.1k

Supports Tools

GLM-4.5-Air is the lightweight version of our newest flagship model family, designed specifically for agent-focused applications. Like GLM-4.5, it uses a Mixture-of-Experts (MoE) architecture, but with a smaller parameter footprint. GLM-4.5-Air also supports hybrid inference modes, including a "thinking mode" for deeper reasoning and tool usage, and a "non-thinking mode" for real-time interactions.

Activity

Last 14 days

Prompt

545M

Completion

35M

Total

580M

Startup

Z.ai

Latency (p50)9.00s

Throughput (p50)22.5 tok/s

Pricing

InputFree

OutputFree

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model

AuthorNvidia

Context Length1M

Supports Tools

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex enterprise tasks. It is particularly strong at multi-step reasoning and planning, with high-throughput inference designed for high-volume agent pipelines. It is part of the NVIDIA Nemotron family of open models for agentic AI.

Activity

Last 14 days

Prompt

158M

Completion

Total

162M

Startup

Nvidia

Latency (p50)28.36s

Throughput (p50)7.1 tok/s

Pricing

InputFree

OutputFree

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model