GLM 4.5 Air (free) vs Deepseek V4 Flash

Compare GLM 4.5 Air (free) and Deepseek V4 Flash on key metrics including price, context length, throughput, and other model features.

AuthorZ.ai

Context Length131.1k

Supports Tools

GLM-4.5-Air is the lightweight version of our newest flagship model family, designed specifically for agent-focused applications. Like GLM-4.5, it uses a Mixture-of-Experts (MoE) architecture, but with a smaller parameter footprint. GLM-4.5-Air also supports hybrid inference modes, including a "thinking mode" for deeper reasoning and tool usage, and a "non-thinking mode" for real-time interactions.

Activity

Last 14 days

Prompt

503M

Completion

30M

Total

533M

Startup

Z.ai

Latency (p50)14.09s

Throughput (p50)18.2 tok/s

Pricing

InputFree

OutputFree

Cached input-

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model

AuthorDeepseek

Context Length1.0M

Supports Tools

DeepSeek V4 Flash is an efficiency-focused Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B active parameters, supporting a 1M-token context window. It is built for fast inference and high-throughput workloads while preserving strong reasoning and coding capabilities. The model features hybrid attention for efficient long-context processing and offers configurable reasoning modes. It is a strong fit for use cases such as coding assistants, chat applications, and agent workflows where responsiveness and cost efficiency matter.

Activity

Last 14 days

Prompt

246M

Completion

19M

Total

266M

Startup

Deepseek

Latency (p50)0.74s

Throughput (p50)64.4 tok/s

Pricing

Input$0.07/M tokens

Output$0.14/M tokens

Cached input$0.01/M tokens

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model