Models

Explore AI models available through NagaAI.

78 models

Sort by

78 models

GLM 5 Turbo

12.8M Tokens

GLM-5 Turbo is a new model by Z.ai built for fast inference and strong performance in agent-based environments like OpenClaw scenarios. It is heavily optimized for real-world agent workflows with long execution chains, offering better decomposition of complex instructions, improved tool use, scheduled and persistent execution, and greater stability throughout extended tasks.

Z.ai

$0.48/1M input tokens$1.60/1M output tokens

-50%

Grok 4.20 Beta

Grok 4.20 Beta is xAI’s latest flagship model, offering industry-leading speed and advanced agentic tool-calling capabilities. It features one of the lowest hallucination rates on the market and strong prompt adherence, enabling consistently accurate and reliable responses.

xAI

$1.00/1M input tokens$3.00/1M output tokens

-50%

GPT 5.4 Pro

3.23M Tokens

GPT-5.4 Pro is OpenAI’s most capable model, extending GPT-5.4’s unified architecture with stronger reasoning for complex, high-stakes tasks. It includes a 1M+ token context window (922K input, 128K output) and supports both text and image inputs. Tuned for step-by-step reasoning, instruction following, and high accuracy, GPT-5.4 Pro stands out in agentic coding, long-context workflows, and multi-step problem solving.

OpenAI

$15.00/1M input tokens$90.00/1M output tokens

-50%

GPT 5.4

491M Tokens

GPT-5.4 is OpenAI’s newest frontier model that merges the Codex and GPT families into a single unified system. It offers a 1M+ token context window (922K input, 128K output) and supports both text and image inputs, enabling high-context reasoning, coding, and multimodal analysis in one workflow. The model brings stronger performance in coding, document understanding, tool usage, and instruction following. It’s built to be a solid default for both general tasks and software engineering—able to produce production-ready code, synthesize information from multiple sources, and handle complex multi-step workflows with fewer iterations and better token efficiency.

OpenAI

$1.25/1M input tokens$7.50/1M output tokens

-50%

GPT-5.3 Chat

7.98M Tokens

GPT-5.3 Chat is an updated version of ChatGPT’s most widely used model, designed to make everyday conversations smoother, more practical, and more directly helpful. It provides more accurate answers with stronger context awareness and noticeably cuts down on unnecessary refusals, excessive caveats, and overly cautious wording that can disrupt the flow of conversation.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

Gemini 3.1 Flash Lite Preview

43.8M Tokens

Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model designed for high-throughput, high-volume use cases. It delivers better overall quality than Gemini 2.5 Flash Lite and comes close to Gemini 2.5 Flash performance across core capabilities. Enhancements include audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. It supports the full range of thinking levels (minimal, low, medium, high) to enable fine-grained cost/performance tuning. Pricing is set at half the cost of Gemini 3 Flash.

Google

$0.13/1M input tokens$0.75/1M output tokens

-50%

Nano Banana 2 (Gemini 3.1 Flash Image Preview)

31.1M Tokens

Nano Banana 2 (Gemini 3.1 Flash Image) is Google DeepMind’s flagship Flash image model for high-fidelity generation and fast, advanced editing at scale, optimized for price–performance. It follows complex prompts more reliably and adds configurable thinking levels (Minimal vs High/Dynamic) to balance latency and quality. Nano Banana 2 improves in-image text rendering and supports in-image localization (generate/translate text across languages directly in the image), while leveraging stronger world knowledge and web image search for more grounded, realistic outputs. It supports native aspect ratios (including 4:1, 1:4, 8:1, 1:8) and 512px/1K/2K/4K resolutions.

Google

$0.13/1M input tokens$0.75/1M output tokens

-50%

GPT-5.3-Codex

111M Tokens

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model. It pairs the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It delivers state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, highlighting better multi-language coding, terminal fluency, and real-world computer-use skills. The model is tuned for long-running, tool-driven workflows and supports interactive steering during execution, making it well-suited for complex development work, debugging, deployment, and iterative product cycles. Outside of coding, GPT-5.3-Codex also performs well on structured knowledge-work benchmarks such as GDPval, enabling tasks like drafting documents, analyzing spreadsheets, creating slides, and conducting operational research across domains. It is trained with increased cybersecurity awareness, including the ability to identify vulnerabilities, and is deployed with extra safeguards for higher-risk scenarios. Relative to earlier Codex models, it is more token-efficient and about 25% faster, aimed at end-to-end professional workflows that combine reasoning, execution, and computer interaction.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

Gemini 3.1 Pro Preview

250M Tokens

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, offering stronger software engineering performance, more dependable agent behavior, and more efficient token use across demanding workflows. Built on the multimodal foundation of the Gemini 3 series, it delivers high-accuracy reasoning across text, images, video, audio, and code, supported by a 1M-token context window. The 3.1 update brings clear improvements on SWE benchmarks and in real-world coding scenarios, along with more robust autonomous task execution in structured areas like finance and spreadsheet-driven workflows. Created for advanced development and agentic systems, Gemini 3.1 Pro Preview enhances long-horizon stability and tool coordination while further improving token efficiency. It also adds a new medium thinking mode to better balance cost, speed, and quality. The model shines in agentic coding, structured planning, multimodal analysis, and workflow automation—making it a strong fit for autonomous agents, financial modeling, spreadsheet automation, and other high-context enterprise tasks.

Google

$1.00/1M input tokens$6.00/1M output tokens

-50%

Claude Sonnet 4.6

539M Tokens

Sonnet 4.6 is Anthropic’s strongest Sonnet-class model to date, delivering frontier-level performance in coding, agentic tasks, and professional work. It shines in iterative development, navigating complex codebases, end-to-end project management with memory, creating polished documents, and reliable computer use for web QA and workflow automation.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

Qwen3.5 397B A17B

70.6M Tokens

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that combines a linear attention mechanism with a sparse mixture-of-experts approach, delivering improved inference efficiency. It achieves state-of-the-art results comparable to top-tier models across a broad range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interaction. Thanks to its strong code-generation and agent capabilities, the model demonstrates solid generalization across a wide variety of agents.

Qwen

$0.30/1M input tokens$1.30/1M output tokens

-50%

MiniMax M2.5

390M Tokens

MiniMax-M2.5 is a state-of-the-art large language model built for real-world productivity. Trained across a wide variety of complex digital work environments, M2.5 extends the coding strengths of M2.1 into broader office tasks—becoming fluent in creating and manipulating Word, Excel, and PowerPoint files, seamlessly switching context between different software tools, and collaborating across mixed agent and human teams. It scores 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp. Compared to earlier generations, M2.5 is also more token-efficient, having been trained to plan effectively in order to optimize both its actions and outputs.

Minimax

$0.15/1M input tokens$0.60/1M output tokens

-50%

GLM 5

119M Tokens

GLM-5 is the flagship open-source foundation model from Z.ai, specifically designed for advanced systems engineering and handling long-term agent workflows. Tailored for skilled developers, it provides production-level performance on extensive programming tasks, matching up to the top proprietary models. Thanks to its sophisticated agent planning, robust backend reasoning, and iterative self-improvement, GLM-5 goes beyond simple code generation to enable comprehensive system creation and autonomous operation.

Z.ai

$0.40/1M input tokens$1.28/1M output tokens

-50%

Claude Opus 4.6

500M Tokens

Opus 4.6 is Anthropic’s most advanced model for programming and handling extensive professional tasks. It’s designed for agents that manage entire workflows rather than isolated prompts, making it particularly effective for working with large codebases, implementing complex refactoring, and managing multi-stage debugging processes that evolve over time. Compared to previous versions, this model demonstrates deeper contextual awareness, better problem analysis, and increased reliability in challenging engineering scenarios. In addition to coding, Opus 4.6 excels at sustained knowledge-intensive work. It’s capable of generating near-production-ready documents, comprehensive plans, and thorough analyses in a single go, while maintaining coherence throughout lengthy outputs and extended sessions. This makes it an excellent default choice for tasks that demand persistence, sound judgment, and consistent execution—such as technical architecture, migration strategy planning, and end-to-end project management.

Anthropic

$2.50/1M input tokens$12.50/1M output tokens

-50%

Kimi K2.5

838M Tokens

Kimi K2.5 is Moonshot AI’s proprietary multimodal model, offering cutting-edge visual coding abilities and supporting a self-directed agent swarm approach. Developed from Kimi K2 and further trained on around 15 trillion mixed visual and text tokens, it achieves excellent results in general reasoning, visual coding, and autonomous tool invocation.

MoonShotAI

$0.30/1M input tokens$1.50/1M output tokens

-50%

GLM 4.7 Flash

18.7M Tokens

As a SOTA 30B-class model, GLM-4.7-Flash provides a new option that balances efficiency and performance. It has been further optimized for agentic coding scenarios, enhancing coding abilities, long-term task planning, and tool integration, and has demonstrated leading results among open-source models of its size on multiple current public benchmark leaderboards.

Z.ai

$0.04/1M input tokens$0.20/1M output tokens

-50%

GPT 5.2 Codex

502M Tokens

GPT-5.2-Codex is an enhanced version of GPT-5.1-Codex, optimized for software engineering and coding tasks. Designed for both interactive sessions and longer, independent execution, it excels at building projects, developing features, debugging, large-scale refactoring, and code review. Compared to its predecessor, 5.2-Codex follows developer instructions more closely and delivers cleaner, higher-quality code. It integrates seamlessly with developer tools—CLI, IDEs, GitHub, and cloud platforms—and adapts its reasoning effort based on task complexity, providing quick responses for simple tasks and maintaining extended performance for larger projects. The model supports structured code reviews, detects critical flaws, validates behavior against tests, and handles multimodal inputs like images for UI work. It is specifically tailored for agentic coding applications.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

MiniMax M2.1

402M Tokens

MiniMax-M2.1 is a cutting-edge, lightweight large language model designed for coding, agentic workflows, and modern application development. With just 10 billion activated parameters, it offers a significant boost in real-world performance while ensuring low latency, high scalability, and cost-effectiveness. Compared to the previous version, M2.1 delivers more concise, clearer outputs and quicker response times. It excels in multilingual coding, achieving 49.4% on Multi-SWE-Bench and 72.5% on SWE-Bench Multilingual, making it an adaptable engine for IDEs, coding tools, and a wide range of assistant applications.

Minimax

$0.15/1M input tokens$0.60/1M output tokens

-50%

GLM 4.7

120M Tokens

GLM-4.7 is Z.AI’s newest flagship model, offering advancements in two main aspects: improved programming abilities and greater stability in multi-step reasoning and execution. It shows notable progress in handling complex agent tasks, while also providing more natural conversational experiences and enhanced front-end design.

Z.ai

$0.30/1M input tokens$1.10/1M output tokens

-50%

Mimo V2 Flash

268M Tokens

MiMo-V2-Flash is an open-source foundational language model created by Xiaomi, featuring a Mixture-of-Experts architecture with 309 billion total parameters (15 billion active) and a hybrid attention mechanism. It supports hybrid-thinking, offers a 256K context window, and excels in reasoning, coding, and agent-based tasks. Ranking #1 among open-source models worldwide on SWE-bench Verified and Multilingual benchmarks, MiMo-V2-Flash matches the performance of Claude Sonnet 4.5 at just 3.5% of the cost. For best and fastest results when using agentic tools like Claude Code, Cline, or Roo Code, be sure to disable reasoning mode, as the model is extensively optimized for these scenarios.

Xiaomi

$0.05/1M input tokens$0.15/1M output tokens

-50%

Gemini 3 Flash Preview

2.46B Tokens

Gemini 3 Flash Preview is a high-speed, cost-effective reasoning model built for agent-driven workflows, multi-turn conversation, and coding support. Offering near-Pro level performance in both reasoning and tool use, it stands out by delivering significantly lower latency than larger Gemini versions—making it ideal for interactive development, long-running agent loops, and collaborative programming. Compared to Gemini 2.5 Flash, it features notable improvements in reasoning ability, multimodal comprehension, and overall reliability. The model supports a 1M token context window and handles multimodal inputs—text, images, audio, video, and PDFs—with text-based output. Features like configurable reasoning levels, structured outputs, tool integration, and automatic context caching make it a strong choice for users seeking powerful agentic capabilities without the high cost or lag of more extensive models.

Google

$0.25/1M input tokens$1.50/1M output tokens

-50%

GPT 5.2 Chat

80.2M Tokens

GPT-5.2 Chat (also known as Instant) is the fast and lightweight version of the 5.2 family, built for low-latency chatting while maintaining strong general intelligence. It leverages adaptive reasoning to focus more “thinking” on challenging queries, boosting accuracy in math, coding, and multi-step tasks without sacrificing speed in everyday conversations. The model is naturally warmer and more conversational, with improved instruction following and more stable short-form reasoning. GPT-5.2 Chat is ideal for high-throughput, interactive scenarios where quick response and consistency are more important than in-depth analysis.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

GPT 5.2 Pro

15.2M Tokens

GPT-5.2 Pro is OpenAI's most advanced model, featuring significant upgrades in agentic coding and long-context capabilities compared to GPT-5 Pro. It is specifically optimized for handling complex tasks that demand step-by-step reasoning, precise instruction following, and accuracy in critical scenarios. The model supports advanced test-time routing and sophisticated prompt understanding, including user cues like "think hard about this." Key improvements include reduced hallucination and sycophancy, along with stronger performance in coding, writing, and health-related tasks.

OpenAI

$11.00/1M input tokens$84.00/1M output tokens

-50%

GPT-5.2

574M Tokens

GPT-5.2 is the newest frontier-level model in the GPT-5 line, providing enhanced agentic abilities and better long-context performance than GPT-5.1. It employs adaptive reasoning to dynamically distribute computational resources, enabling quick responses to simple requests and deeper analysis for complex challenges. Designed for wide-ranging tasks, GPT-5.2 offers steady improvements in mathematics, programming, science, and tool usage, delivering more coherent long-form responses and increased reliability when using tools.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

GPT 5.1 Codex Max

127M Tokens

GPT-5.1-Codex-Max is OpenAI’s newest agentic coding model, created for extended, high-context software development tasks. Built on an enhanced 5.1 reasoning stack, it’s been trained with agentic workflows covering software engineering, mathematics, and research. GPT-5.1-Codex-Max offers faster performance, better reasoning abilities, and increased token efficiency throughout the development process.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

Deepseek v3.2

2.59B Tokens

DeepSeek-V3.2 is a large language model optimized for high computational efficiency and strong tool-use reasoning. It features DeepSeek Sparse Attention (DSA), a mechanism that lowers training and inference costs while maintaining quality in long-context tasks. A scalable reinforcement learning post-training framework further enhances reasoning, achieving performance comparable to GPT-5 and earning top results on the 2025 IMO and IOI. V3.2 also leverages large-scale agentic task synthesis to improve reasoning in practical tool-use scenarios, boosting its generalization and compliance in interactive environments.

Deepseek

$0.14/1M input tokens$0.21/1M output tokens

-50%

Claude Opus 4.5

518M Tokens

Claude Opus 4.5 is Anthropic’s latest reasoning model, developed for advanced software engineering, complex agent workflows, and extended computer tasks. It features strong multimodal capabilities and performs competitively on real-world coding and reasoning benchmarks, with enhanced resilience against prompt injection. Optimized for efficiency at varying effort levels, it allows developers to balance speed, depth, and token usage according to their specific needs, thanks to a new parameter for controlling token efficiency. Opus 4.5 excels in advanced tool integration, contextual management, and multi-agent coordination, making it ideal for autonomous research, debugging, complex planning, and spreadsheet or browser manipulation. Compared to previous Opus generations, it delivers significant improvements in structured reasoning, long-duration task performance, execution reliability, and alignment, all while reducing token overhead.

Anthropic

$2.50/1M input tokens$12.50/1M output tokens

-50%

Gemini 3 Pro Image Preview (Nano Banana Pro)

95.1M Tokens

Gemini 3 Pro Image Preview (Nano Banana Pro) is Google’s most advanced image generation and editing model, built on Gemini 3 Pro. Building on the original Nano Banana, it offers much improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model produces context-rich visuals—from infographics and diagrams to cinematic composites—and can incorporate up-to-the-minute information through Search grounding. It leads the industry with sophisticated text rendering in images, handles consistent multi-image blending, and maintains accurate identity preservation for up to five subjects. Nano Banana Pro gives users fine-grained creative controls like localized edits, lighting and focus adjustments, camera transformations, 2K/4K output, and flexible aspect ratios. Tailored for professional design, product visualization, storyboarding, and complex compositions, it remains efficient for everyday image creation needs.

Google

$1.00/1M input tokens$6.00/1M output tokens

-50%

Gemini 3 Pro Preview

630M Tokens

Gemini 3 Pro Preview is Google’s most advanced AI model, setting new records on leading benchmarks like LMArena (1501 Elo), GPQA Diamond (91.9%), and MathArena Apex (23.4%). With powerful reasoning and deep multimodal understanding across text, images, code, video, and audio, Gemini 3 Pro Preview delivers nuanced, context-aware responses and excels at complex problem-solving, scientific analysis, and creative coding tasks.

Google

$1.00/1M input tokens$6.00/1M output tokens

-50%

GPT 5.1 Codex Mini

3.94B Tokens

GPT-5.1-Codex-Mini is a more compact and faster variant of GPT-5.1-Codex.

OpenAI

$0.13/1M input tokens$1.00/1M output tokens

-50%

GPT 5.1 Codex

27.1M Tokens

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It's designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, closely follows developer instructions, and produces cleaner, higher-quality code. Codex integrates into developer environments like the CLI, IDE extensions, GitHub, and cloud tasks. It adapts its reasoning dynamically—providing quick answers for small tasks and sustaining long, multi-hour runs for large projects. The model is trained for structured code reviews, identifying critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs like images or screenshots for UI development and integrates tools for search, dependency installation, and environment setup. Codex is specifically intended for agentic coding applications.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT 5.1 Chat

8.45M Tokens

GPT-5.1 Chat (also known as Instant) is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT 5.1

142M Tokens

GPT-5.1 is the newest top-tier model in the GPT-5 series, featuring enhanced general reasoning, better instruction following, and a more natural conversational tone compared to GPT-5. With adaptive reasoning, it dynamically adjusts its computational effort—responding swiftly to simple queries and diving deeper into complex tasks. Explanations are now clearer and use less jargon, making challenging topics easier to grasp. Designed for a wide range of tasks, GPT-5.1 consistently improves performance in math, coding, and structured analysis, offering more cohesive long-form responses and more reliable tool usage. Its conversation style is warmer and more intuitive, yet still precise. GPT-5.1 stands as the main, fully capable successor to GPT-5.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

Kimi K2 Thinking

269M Tokens

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model, extending the K2 series into agentic, long-horizon reasoning. Built on a trillion-parameter Mixture-of-Experts (MoE) architecture, it activates 32 billion parameters per forward pass and supports a 256k-token context window. Optimized for persistent step-by-step thought and dynamic tool use, it enables complex reasoning workflows and stable multi-agent behavior across 200–300 tool calls, setting new open-source records on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench. With MuonClip optimization and large-scale MoE architecture, it delivers strong reasoning depth and high inference efficiency for demanding agentic and analytical tasks.

MoonShotAI

$0.25/1M input tokens$1.25/1M output tokens

-50%

MiniMax M2

11.6M Tokens

MiniMax-M2 is a compact, efficient language model with 10B active (230B total) parameters, optimized for coding and agentic workflows. It achieves near-frontier reasoning and tool use with low latency and deployment cost. The model excels in code generation, multi-file editing, compile-run-fix cycles, and automated test repair, showing strong results on SWE-Bench and Terminal-Bench. MiniMax-M2 performs well in agentic benchmarks like BrowseComp and GAIA, handling long-term planning, retrieval, and error recovery. With a small activation footprint, it delivers fast inference and high concurrency, making it ideal for developer tools, agents, and applications that demand cost-effective, responsive reasoning.

Minimax

$0.10/1M input tokens$0.50/1M output tokens

-50%

Claude Haiku 4.5

513M Tokens

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, offering near-frontier intelligence with much lower cost and latency than larger Claude models. It matches Claude Sonnet 4’s performance in reasoning, coding, and computer-use tasks, making it ideal for real-time and large-scale applications. Haiku 4.5 introduces controllable reasoning depth, supports summarized or interleaved thought outputs, and enables tool-assisted workflows across coding, bash, web search, and computer-use tools. With over 73% on SWE-bench Verified, it stands among the top coding models while maintaining fast responsiveness for sub-agents, parallel execution, and scaled deployment.

Anthropic

$0.50/1M input tokens$2.50/1M output tokens

-50%

Gemini 2.5 Flash Image

32.5M Tokens

Gemini 2.5 Flash Image, also known as "Nano Banana" is a state-of-the-art image generation model with strong contextual understanding. It supports image generation, editing, and multi-turn conversational interactions.

Google

$0.15/1M input tokens$1.25/1M output tokens

-50%

GLM 4.6

110M Tokens

GLM-4.6 is the latest version in the GLM series, featuring a longer 200K token context window (up from 128K in GLM-4.5) for handling more complex tasks. It offers improved coding performance with higher benchmark scores and better real-world results, including visually enhanced front-end code generation. The model also delivers stronger reasoning, more effective tool use during inference, better integration within agent frameworks, and more refined, human-like writing style compared to GLM-4.5.

Z.ai

$0.25/1M input tokens$0.88/1M output tokens

-50%

Claude Sonnet 4.5

1.23B Tokens

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model, optimized for real-world agents and coding workflows. It achieves state-of-the-art results on coding benchmarks like SWE-bench Verified, with notable improvements in system design, code security, and following specifications. Designed for extended autonomous operation, the model maintains task continuity across sessions and offers fact-based progress tracking. Sonnet 4.5 features enhanced agentic abilities, such as improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With better context tracking and awareness of token usage across tool calls, it excels in multi-context and long-running workflows. Key use cases include software engineering, cybersecurity, financial analysis, research agents, and other areas requiring sustained reasoning and tool use.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

DeepSeek v3.2 Exp

819M Tokens

DeepSeek-V3.2-Exp is an experimental large language model from DeepSeek, serving as an intermediate step between V3.1 and future architectures. It features DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that enhances training and inference efficiency for long-context tasks while preserving high output quality.

Deepseek

$0.14/1M input tokens$0.20/1M output tokens

-50%

Gemini 2.5 Flash Lite Preview 09-2025

130M Tokens

Gemini 2.5 Flash-Lite Preview September 2025 Checkpoint is a lightweight, high-throughput model from the Gemini 2.5 family, focused on ultra-low latency and cost efficiency. It delivers even faster token generation, concise output, and improved performance on standard benchmarks compared to earlier Flash-Lite models, making it ideal for large-scale, real-time applications.

Google

$0.05/1M input tokens$0.20/1M output tokens

-50%

Gemini 2.5 Flash Preview 09-2025

183M Tokens

Gemini 2.5 Flash Preview September 2025 Checkpoint is Google’s high-performance model, built for advanced reasoning, code generation, mathematical tasks, and scientific applications. This version introduces faster, more efficient output and smarter tool use for complex, multi-step workflows.

Google

$0.15/1M input tokens$1.25/1M output tokens

-50%

GPT-5 Codex

39.9M Tokens

GPT-5-Codex is a specialized version of GPT-5 tailored for software engineering and coding tasks. It is suitable for both interactive development sessions and the independent execution of complex engineering projects. The model is capable of building projects from scratch, developing new features, debugging, performing large-scale refactoring, and conducting code reviews. Compared to the standard GPT-5, Codex offers greater steerability, follows developer instructions more closely, and delivers cleaner, higher-quality code.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

DeepSeek Chat v3.1 Terminus

50.4M Tokens

DeepSeek-V3.1 Terminus is an enhanced version of DeepSeek V3.1 that retains the original model’s capabilities while resolving user-reported issues, such as language consistency and agent functionality. The update further refines the model’s performance in coding and search agent tasks. This large-scale hybrid reasoning model (671B parameters, 37B active) supports both thinking and non-thinking modes. Building on the DeepSeek-V3 foundation, it incorporates a two-phase long-context training approach, allowing for up to 128K tokens, and adopts FP8 microscaling for more efficient inference.

Deepseek

$0.14/1M input tokens$0.50/1M output tokens

-50%

GLM 4.5

11.6M Tokens

GLM-4.5 is the latest flagship foundation model from Z.AI, specifically designed for agent-based applications. It utilizes a Mixture-of-Experts (MoE) architecture and supports context lengths of up to 128k tokens. GLM-4.5 offers significantly improved capabilities in reasoning, code generation, and agent alignment. It features a hybrid inference mode with two options: a "thinking mode," tailored for complex reasoning and tool usage, and a "non-thinking mode," optimized for instant responses.

Z.ai

$0.17/1M input tokens$0.78/1M output tokens

-50%

Qwen3 Next 80B A3B Thinking

80.8M Tokens

Qwen3-Next-80B-A3B-Thinking is a reasoning-focused model that generates structured “thinking” traces by default. Suited for complex multi-step tasks like math proofs, code synthesis, logic, and agentic planning. Compared to earlier Qwen3 models, it’s more stable with long reasoning chains and scales efficiently during inference. Designed for agent frameworks, function calling, retrieval-based workflows, and benchmarks needing step-by-step solutions, it supports detailed completions and faster output through multi-token prediction. Runs only in thinking mode.

Qwen

$0.07/1M input tokens$0.70/1M output tokens

-50%

Gemini 2.5 Flash Image Preview

30.3M Tokens

Gemini 2.5 Flash Image Preview is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations.

Google

$0.15/1M input tokens$1.25/1M output tokens

-50%

DeepSeek Chat v3.1

774M Tokens

DeepSeek-V3.1 is a 671B-parameter hybrid reasoning model (37B active), supporting both "thinking" and "non-thinking" modes via prompt templates. It extends DeepSeek-V3 with two-phase long-context training (up to 128K tokens) and uses FP8 microscaling for efficient inference. The model excels in tool use, code generation, and reasoning, with performance comparable to DeepSeek-R1 but with faster responses. It supports structured tool calling, code agents, and search agents, making it ideal for research and agentic workflows. Successor to DeepSeek V3-0324, it delivers strong performance across diverse tasks.

Deepseek

$0.10/1M input tokens$0.40/1M output tokens

-50%

GPT-5 Mini (Free)

1.41B Tokens

A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.

OpenAI

Free

GPT-5 Chat Latest

36.6M Tokens

GPT-5 Chat is tailored for advanced, natural, and context-aware conversations in enterprise environments. It leverages the latest advancements in OpenAI’s conversational AI, supporting multimodal and dynamic dialogue with enhanced context retention and user intent understanding.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT-5 Nano

223M Tokens

The smallest and fastest member of the GPT-5 family, optimized for developer tools, rapid user interactions, and ultra-low latency environments. While it offers limited reasoning depth compared to larger models, GPT-5-Nano preserves essential instruction-following and safety mechanisms. It is the successor to GPT-4.1-nano and is best suited for real-time, cost-sensitive, or embedded applications.

OpenAI

$0.02/1M input tokens$0.20/1M output tokens

-50%

GPT-5 Mini

1.15B Tokens

OpenAI

$0.13/1M input tokens$1.00/1M output tokens

-50%

GPT-5

379M Tokens

OpenAI’s most advanced large language model, engineered for high-stakes applications requiring step-by-step reasoning, precise instruction following, and robust code generation. GPT-5 introduces major improvements in factual accuracy, user intent understanding, and hallucination reduction. It supports advanced prompt routing, user-specified intent (such as "think hard about this"), and is optimized for complex workflows in coding, writing, and health-related domains.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT OSS 20B

817M Tokens

OpenAI’s 21B-parameter open-weight Mixture-of-Experts (MoE) model, released under the Apache 2.0 license. Features 3.6B active parameters per forward pass, optimized for low-latency inference and deployability on consumer or single-GPU hardware. Trained in OpenAI’s Harmony response format, it supports reasoning level configuration, fine-tuning, and agentic capabilities such as function calling and structured outputs.

OpenAI

$0.02/1M input tokens$0.10/1M output tokens

-50%

GPT OSS 120B

839M Tokens

An open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI, designed for high-reasoning, agentic, and general-purpose production use cases. Activates 5.1B parameters per forward pass and is optimized for single H100 GPU deployment with native MXFP4 quantization. Supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

OpenAI

$0.07/1M input tokens$0.30/1M output tokens

-50%

Claude Opus 4.1

33.9M Tokens

Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves high scores on SWE-bench Verified and excels in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for research, data analysis, and tool-assisted workflows.

Anthropic

$10.00/1M input tokens$50.00/1M output tokens

-50%

Qwen3 235B A22B Thinking 2507

95.4M Tokens

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. Activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, and is instruction-tuned for step-by-step reasoning, tool use, agentic workflows, and multilingual tasks.

Qwen

$0.07/1M input tokens$0.30/1M output tokens

-50%

Qwen3 Coder

16.1M Tokens

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. Optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. Features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts), and supports variable pricing based on context length.

Qwen

$0.15/1M input tokens$0.60/1M output tokens

-50%

Qwen3 235B A22B 2507

1.86M Tokens

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. Optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. Supports a native 262K context length and delivers significant gains in knowledge coverage, long-context reasoning, and coding benchmarks.

Qwen

$0.07/1M input tokens$0.42/1M output tokens

-50%

Gemini 2.5 Flash Lite

1.67M Tokens

Gemini 2.5 Flash-Lite is a streamlined reasoning model from the Gemini 2.5 family, designed for extremely low latency and cost-effectiveness. It delivers higher throughput, quicker token generation, and enhanced performance on standard benchmarks compared to previous Flash models.

Google

$0.05/1M input tokens$0.20/1M output tokens

-50%

Gemini 2.5 Flash (Free)

16.6B Tokens

Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.

Google

Free

MiniMax M1

660K Tokens

MiniMax-M1 is a large-scale, open-weight reasoning model with 456B total parameters and 45.9B active per token, leveraging a hybrid Mixture-of-Experts (MoE) architecture and a custom "lightning attention" mechanism. It supports context windows up to 1 million tokens and is optimized for long-context understanding, software engineering, agentic tool use, and mathematical reasoning. The model is trained via a custom reinforcement learning pipeline (CISPO) and demonstrates strong performance on FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench.

Minimax

$0.40/1M input tokens$0.96/1M output tokens

-50%

Gemini 2.0 Flash Lite

21.2M Tokens

Gemini 2.0 Flash Lite is optimized for extremely fast response times and low cost, while maintaining the quality of larger models. Ideal for real-time and large-scale applications.

Google

$0.04/1M input tokens$0.15/1M output tokens

-50%

Gemini 2.5 Flash

902M Tokens

Google

$0.15/1M input tokens$1.25/1M output tokens

-50%

Gemini 2.5 Pro

970M Tokens

Gemini 2.5 Pro is Google’s state-of-the-art AI model, designed for advanced reasoning, coding, mathematics, and scientific tasks. Employs “thinking” capabilities for nuanced context handling and achieves top-tier performance on multiple benchmarks, including first-place on the LMArena leaderboard.

Google

$0.63/1M input tokens$5.00/1M output tokens

-50%

O1

72.5K Tokens

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. Trained with large-scale reinforcement learning for chain-of-thought reasoning, it is optimized for math, science, programming, and other STEM tasks, consistently achieving PhD-level accuracy on industry benchmarks.

OpenAI

$7.50/1M input tokens$30.00/1M output tokens

-50%

Llama 4 Scout 17B 16E Instruct

1.42M Tokens

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model from Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding.

Meta Llama

$0.24/1M input tokens$0.96/1M output tokens

-50%

Qwen3 32B

600M Tokens

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. Supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. Demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects.

Qwen

$0.05/1M input tokens$0.15/1M output tokens

-50%

DeepSeek Chat

717M Tokens

DeepSeek V3 is a 685B-parameter, mixture-of-experts model and the latest iteration of the flagship chat model family from the DeepSeek team. Succeeds the previous DeepSeek V3 model and demonstrates strong performance across a variety of tasks.

Deepseek

$0.07/1M input tokens$0.14/1M output tokens

-50%

O3

25.3M Tokens

A well-rounded, powerful model from OpenAI, setting new standards in math, science, coding, and visual reasoning. Excels at technical writing and instruction-following, and is designed for multi-step problem solving across text, code, and images. BYOK is required for access.

OpenAI

$1.00/1M input tokens$4.00/1M output tokens

-50%

O4 Mini

45.4M Tokens

A compact reasoning model in OpenAI’s o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. Supports tool use and demonstrates competitive reasoning and coding performance across benchmarks, outperforming its predecessor o3-mini and approaching o3 in some domains. Well-suited for high-throughput scenarios where latency or cost is critical.

OpenAI

$0.55/1M input tokens$2.20/1M output tokens

-50%

Claude Opus 4

35.3M Tokens

Anthropic

$10.00/1M input tokens$50.00/1M output tokens

-50%

Claude Sonnet 4

224M Tokens

Claude Sonnet 4 is a next-generation model from Anthropic, significantly enhancing coding and reasoning capabilities over its predecessor. It achieves state-of-the-art performance on SWE-bench, balances capability and computational efficiency, and is optimized for both routine and complex software development projects. Key features include improved codebase navigation, reduced error rates, and increased reliability in following intricate instructions.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

Llama Guard 4 12B

103K Tokens

Llama Guard 4 is a multimodal content safety classifier derived from Llama 4 Scout, fine-tuned for both prompt and response classification. It supports content moderation for English and multiple languages, including mixed text-and-image prompts. The model is aligned with the MLCommons hazards taxonomy and is integrated into the Llama Moderations API for robust safety classification in text and images.

Meta Llama

$0.02/1M input tokens$0.02/1M output tokens

-50%

O3 Mini

7.11M Tokens

A cost-efficient language model from OpenAI, optimized for STEM reasoning tasks, especially in science, mathematics, and coding. Supports the `reasoning_effort` parameter for adjustable thinking time and features significant improvements over its predecessor, with better performance on complex questions and lower latency and cost.

OpenAI

$0.55/1M input tokens$2.20/1M output tokens

-50%

Command A

91.2K Tokens

Command A is an open-weights 111B parameter model from Cohere, featuring a 256k context window and optimized for agentic, multilingual, and coding use cases. It delivers high performance with minimal hardware costs, excelling in business-critical workflows that require advanced reasoning, tool use, and language understanding across multiple languages.

Cohere

$1.25/1M input tokens$5.00/1M output tokens

-50%

Claude 3.7 Sonnet

10.9M Tokens

Claude 3.7 Sonnet is an advanced large language model from Anthropic, featuring improved reasoning, coding, and problem-solving abilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model excels in agentic workflows, front-end development, and full-stack updates, and offers an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

Gemini 2.0 Flash

367M Tokens

Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.

Google

$0.05/1M input tokens$0.20/1M output tokens

-50%