GLM 5 Turbo vs Step 3.5 Flash (free)

Compare GLM 5 Turbo and Step 3.5 Flash (free) on key metrics including price, context length, throughput, and other model features.

AuthorZ.ai

Context Length202.8k

Supports Tools

GLM-5 Turbo is a new model by Z.ai built for fast inference and strong performance in agent-based environments like OpenClaw scenarios. It is heavily optimized for real-world agent workflows with long execution chains, offering better decomposition of complex instructions, improved tool use, scheduled and persistent execution, and greater stability throughout extended tasks.

Activity

Last 14 days

Prompt

233M

Completion

Total

235M

Startup

Z.ai

Latency (p50)3.71s

Throughput (p50)22.7 tok/s

Pricing

Input$0.48/M tokens

Output$1.60/M tokens

Cached input$0.10/M tokens

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model

AuthorStepFun

Context Length256k

Supports Tools

Step 3.5 Flash is our most capable open-source foundation model, designed to deliver frontier-level reasoning and agentic performance with standout efficiency. It uses a sparse Mixture of Experts (MoE) architecture that activates only 11B of its 196B parameters per token, concentrating “intelligence density” to approach top proprietary models while staying fast enough for real-time interaction. Built for rapid, deep reasoning, it’s powered by 3-way Multi-Token Prediction (MTP-3), enabling typical generation speeds of 100–300 tok/s (and up to ~350 tok/s in single-stream coding). For coding and long-horizon agent work, it integrates a scalable RL training framework that supports stable autonomous execution, reaching 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0. For long-context workloads, Step 3.5 Flash offers a cost-efficient 256K context window via a hybrid attention design with a 3:1 Sliding Window Attention ratio (three SWA layers per one full-attention layer), helping maintain performance on large codebases and massive documents while reducing the compute burden typical of long-context models.

Activity

Last 14 days

Prompt

295M

Completion

26M

Total

322M

Startup

StepFun

Latency (p50)4.27s

Throughput (p50)49.2 tok/s

Pricing

InputFree

OutputFree

Cached input-

Features

Input Modalitiestext

Output Modalitiestext

Supported EndpointsChat Completions

Vision

Supports Tools

Go to model