Compare GLM 4.5 Air (free) and MiniMax M3 on key metrics including price, context length, throughput, and other model features.
GLM-4.5-Air is the lightweight version of our newest flagship model family, designed specifically for agent-focused applications. Like GLM-4.5, it uses a Mixture-of-Experts (MoE) architecture, but with a smaller parameter footprint. GLM-4.5-Air also supports hybrid inference modes, including a "thinking mode" for deeper reasoning and tool usage, and a "non-thinking mode" for real-time interactions.
MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding, and tool use. It is built on MiniMax Sparse Attention (MSA), which replaces full attention with KV-block selection to cut per-token compute at long context — roughly 1/20 the cost of the previous generation at 1M tokens, with substantially faster prefill and decode while retaining quality across most tasks. Trained as a native multimodal model on interleaved data and tuned for multi-turn, production-like collaboration via an interactive user-simulator framework, the model is oriented toward sustained, multi-step tasks rather than single-turn execution.