Deepseek

Token usage over time

Browse models from Deepseek

12 models

DeepSeek-V3.1 is a 671B-parameter hybrid reasoning model (37B active), supporting both "thinking" and "non-thinking" modes via prompt templates. It extends DeepSeek-V3 with two-phase long-context training (up to 128K tokens) and uses FP8 microscaling for efficient inference. The model excels in tool use, code generation, and reasoning, with performance comparable to DeepSeek-R1 but with faster responses. It supports structured tool calling, code agents, and search agents, making it ideal for research and agentic workflows. Successor to DeepSeek V3-0324, it delivers strong performance across diverse tasks.

byDeepseek
Free

DeepSeek v3.2 Exp

248M Tokens

DeepSeek-V3.2-Exp is an experimental large language model from DeepSeek, serving as an intermediate step between V3.1 and future architectures. It features DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that enhances training and inference efficiency for long-context tasks while preserving high output quality.

byDeepseek
$0.14/1M input tokens$0.20/1M output tokens

DeepSeek Chat v3.1

389M Tokens

DeepSeek-V3.1 is a 671B-parameter hybrid reasoning model (37B active), supporting both "thinking" and "non-thinking" modes via prompt templates. It extends DeepSeek-V3 with two-phase long-context training (up to 128K tokens) and uses FP8 microscaling for efficient inference. The model excels in tool use, code generation, and reasoning, with performance comparable to DeepSeek-R1 but with faster responses. It supports structured tool calling, code agents, and search agents, making it ideal for research and agentic workflows. Successor to DeepSeek V3-0324, it delivers strong performance across diverse tasks.

byDeepseek
$0.10/1M input tokens$0.40/1M output tokens

DeepSeek-V3.2-Speciale is a high-compute version of DeepSeek-V3.2, designed for maximum reasoning and agentic performance. It builds on DeepSeek Sparse Attention (DSA) for efficient long-context handling and uses scaled reinforcement learning post-training to surpass the base model. Evaluations show that Speciale outperforms GPT-5 on complex reasoning tasks and matches Gemini-3.0-Pro in proficiency, while maintaining strong coding and tool-use reliability. Like V3.2, it uses a large-scale agentic task synthesis pipeline to enhance compliance and generalization in interactive environments.

byDeepseek
$0.14/1M input tokens$0.21/1M output tokens

DeepSeek Chat

7.76M Tokens

DeepSeek V3 is a 685B-parameter, mixture-of-experts model and the latest iteration of the flagship chat model family from the DeepSeek team. Succeeds the previous DeepSeek V3 model and demonstrates strong performance across a variety of tasks.

byDeepseek
$0.07/1M input tokens$0.14/1M output tokens

Deepseek v3.2

664K Tokens

DeepSeek-V3.2 is a large language model optimized for high computational efficiency and strong tool-use reasoning. It features DeepSeek Sparse Attention (DSA), a mechanism that lowers training and inference costs while maintaining quality in long-context tasks. A scalable reinforcement learning post-training framework further enhances reasoning, achieving performance comparable to GPT-5 and earning top results on the 2025 IMO and IOI. V3.2 also leverages large-scale agentic task synthesis to improve reasoning in practical tool-use scenarios, boosting its generalization and compliance in interactive environments.

byDeepseek
$0.14/1M input tokens$0.21/1M output tokens

DeepSeek Reasoner

21.6M Tokens

DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1, utilizing more compute and advanced post-training techniques to push its reasoning and inference capabilities to the level of flagship models like O3 and Gemini 2.5 Pro. Excels in math, programming, and logic leaderboards, with a distilled 8B-parameter variant that rivals much larger models on key benchmarks.

byDeepseek
$0.28/1M input tokens$1.10/1M output tokens

DeepSeek-V3.1 Terminus is an enhanced version of DeepSeek V3.1 that retains the original model’s capabilities while resolving user-reported issues, such as language consistency and agent functionality. The update further refines the model’s performance in coding and search agent tasks. This large-scale hybrid reasoning model (671B parameters, 37B active) supports both thinking and non-thinking modes. Building on the DeepSeek-V3 foundation, it incorporates a two-phase long-context training approach, allowing for up to 128K tokens, and adopts FP8 microscaling for more efficient inference.

byDeepseek
$0.14/1M input tokens$0.50/1M output tokens

DeepSeek-V3.2-Exp is an experimental large language model from DeepSeek, serving as an intermediate step between V3.1 and future architectures. It features DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that enhances training and inference efficiency for long-context tasks while preserving high output quality.

byDeepseek
Free

DeepSeek Prover v2

46.4K Tokens

DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from DeepSeek-Prover-V1.5, but released without an official announcement or detailed documentation.

byDeepseek
$0.35/1M input tokens$1.25/1M output tokens

DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1, utilizing more compute and advanced post-training techniques to push its reasoning and inference capabilities to the level of flagship models like O3 and Gemini 2.5 Pro. Excels in math, programming, and logic leaderboards, with a distilled 8B-parameter variant that rivals much larger models on key benchmarks.

byDeepseek
Free

DeepSeek V3 is a 685B-parameter, mixture-of-experts model and the latest iteration of the flagship chat model family from the DeepSeek team. Succeeds the previous DeepSeek V3 model and demonstrates strong performance across a variety of tasks.

byDeepseek
Free