MoonShotAI
Token usage over time
Browse models from MoonShotAI
Kimi K2 Thinking
Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model, extending the K2 series into agentic, long-horizon reasoning. Built on a trillion-parameter Mixture-of-Experts (MoE) architecture, it activates 32 billion parameters per forward pass and supports a 256k-token context window. Optimized for persistent step-by-step thought and dynamic tool use, it enables complex reasoning workflows and stable multi-agent behavior across 200–300 tool calls, setting new open-source records on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench. With MuonClip optimization and large-scale MoE architecture, it delivers strong reasoning depth and high inference efficiency for demanding agentic and analytical tasks.
Kimi K2 0905
Kimi K2 0905 is the September update of Kimi K2 0711, a Mixture-of-Experts (MoE) language model from Moonshot AI with 1 trillion parameters and 32 billion active per pass. The long-context window has been expanded to 256k tokens. This release brings improved agentic coding accuracy and generalization across scaffolds, as well as more aesthetic and functional frontend code for web, 3D, and similar tasks. Kimi K2 remains optimized for advanced tool use, reasoning, and code synthesis, excelling in benchmarks like LiveCodeBench, SWE-bench, ZebraLogic, GPQA, Tau2, and AceBench. Its training uses a novel stack with the MuonClip optimizer for stable large-scale MoE training.
Kimi K2
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.