GLM 4.5 Air (free) vs Gemini 2.5 Flash Lite

Compare GLM 4.5 Air (free) and Gemini 2.5 Flash Lite on key metrics including price, context length, throughput, and other model features.

AuthorZ.ai

Context Length131.1k

Supports Tools

GLM-4.5-Air is the lightweight version of our newest flagship model family, designed specifically for agent-focused applications. Like GLM-4.5, it uses a Mixture-of-Experts (MoE) architecture, but with a smaller parameter footprint. GLM-4.5-Air also supports hybrid inference modes, including a "thinking mode" for deeper reasoning and tool usage, and a "non-thinking mode" for real-time interactions.

Activity

Last 14 days

Prompt

503M

Completion

30M

Total

533M

Startup

Z.ai

Latency (p50)5.67s

Throughput (p50)25.8 tok/s

Pricing

InputFree

OutputFree

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model

AuthorGoogle

Context Length1.0M

Supports Tools

Gemini 2.5 Flash-Lite is a streamlined reasoning model from the Gemini 2.5 family, designed for extremely low latency and cost-effectiveness. It delivers higher throughput, quicker token generation, and enhanced performance on standard benchmarks compared to previous Flash models.

Activity

Last 14 days

Prompt

597M

Completion

Total

604M

Startup

Google

Latency (p50)0.67s

Throughput (p50)86.2 tok/s

Pricing

Input$0.05/M tokens

Output$0.20/M tokens

Features

Input Modalitiestext, image, file, audio

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model