Step 3.5 Flash is our most capable open-source foundation model, designed to deliver frontier-level reasoning and agentic performance with standout efficiency. It uses a sparse Mixture of Experts (MoE) architecture that activates only 11B of its 196B parameters per token, concentrating “intelligence density” to approach top proprietary models while staying fast enough for real-time interaction. Built for rapid, deep reasoning, it’s powered by 3-way Multi-Token Prediction (MTP-3), enabling typical generation speeds of 100–300 tok/s (and up to ~350 tok/s in single-stream coding). For coding and long-horizon agent work, it integrates a scalable RL training framework that supports stable autonomous execution, reaching 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0. For long-context workloads, Step 3.5 Flash offers a cost-efficient 256K context window via a hybrid attention design with a 3:1 Sliding Window Attention ratio (three SWA layers per one full-attention layer), helping maintain performance on large codebases and massive documents while reducing the compute burden typical of long-context models.