Models

Explore AI models available through NagaAI.

101 models

Sort by

101 models

Grok 4.20 Beta

Grok 4.20 Beta is xAI’s latest flagship model, offering industry-leading speed and advanced agentic tool-calling capabilities. It features one of the lowest hallucination rates on the market and strong prompt adherence, enabling consistently accurate and reliable responses.

xAI

$1.00/1M input tokens$3.00/1M output tokens

-50%

GPT 5.4 Pro

3.23M Tokens

GPT-5.4 Pro is OpenAI’s most capable model, extending GPT-5.4’s unified architecture with stronger reasoning for complex, high-stakes tasks. It includes a 1M+ token context window (922K input, 128K output) and supports both text and image inputs. Tuned for step-by-step reasoning, instruction following, and high accuracy, GPT-5.4 Pro stands out in agentic coding, long-context workflows, and multi-step problem solving.

OpenAI

$15.00/1M input tokens$90.00/1M output tokens

-50%

GPT 5.4

492M Tokens

GPT-5.4 is OpenAI’s newest frontier model that merges the Codex and GPT families into a single unified system. It offers a 1M+ token context window (922K input, 128K output) and supports both text and image inputs, enabling high-context reasoning, coding, and multimodal analysis in one workflow. The model brings stronger performance in coding, document understanding, tool usage, and instruction following. It’s built to be a solid default for both general tasks and software engineering—able to produce production-ready code, synthesize information from multiple sources, and handle complex multi-step workflows with fewer iterations and better token efficiency.

OpenAI

$1.25/1M input tokens$7.50/1M output tokens

-50%

GPT-5.3 Chat

7.98M Tokens

GPT-5.3 Chat is an updated version of ChatGPT’s most widely used model, designed to make everyday conversations smoother, more practical, and more directly helpful. It provides more accurate answers with stronger context awareness and noticeably cuts down on unnecessary refusals, excessive caveats, and overly cautious wording that can disrupt the flow of conversation.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

Gemini 3.1 Flash Lite Preview

44M Tokens

Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model designed for high-throughput, high-volume use cases. It delivers better overall quality than Gemini 2.5 Flash Lite and comes close to Gemini 2.5 Flash performance across core capabilities. Enhancements include audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. It supports the full range of thinking levels (minimal, low, medium, high) to enable fine-grained cost/performance tuning. Pricing is set at half the cost of Gemini 3 Flash.

Google

$0.13/1M input tokens$0.75/1M output tokens

-50%

Nano Banana 2 (Gemini 3.1 Flash Image Preview)

31.1M Tokens

Nano Banana 2 (Gemini 3.1 Flash Image) is Google DeepMind’s flagship Flash image model for high-fidelity generation and fast, advanced editing at scale, optimized for price–performance. It follows complex prompts more reliably and adds configurable thinking levels (Minimal vs High/Dynamic) to balance latency and quality. Nano Banana 2 improves in-image text rendering and supports in-image localization (generate/translate text across languages directly in the image), while leveraging stronger world knowledge and web image search for more grounded, realistic outputs. It supports native aspect ratios (including 4:1, 1:4, 8:1, 1:8) and 512px/1K/2K/4K resolutions.

Google

$0.13/1M input tokens$0.75/1M output tokens

-50%

GPT-5.3-Codex

111M Tokens

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model. It pairs the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It delivers state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, highlighting better multi-language coding, terminal fluency, and real-world computer-use skills. The model is tuned for long-running, tool-driven workflows and supports interactive steering during execution, making it well-suited for complex development work, debugging, deployment, and iterative product cycles. Outside of coding, GPT-5.3-Codex also performs well on structured knowledge-work benchmarks such as GDPval, enabling tasks like drafting documents, analyzing spreadsheets, creating slides, and conducting operational research across domains. It is trained with increased cybersecurity awareness, including the ability to identify vulnerabilities, and is deployed with extra safeguards for higher-risk scenarios. Relative to earlier Codex models, it is more token-efficient and about 25% faster, aimed at end-to-end professional workflows that combine reasoning, execution, and computer interaction.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

Seedream 5 Lite

7.6M Tokens

Seedream 5.0 lite is ByteDance’s latest proprietary image generation model. Compared to earlier Seedream versions, it extends beyond standard text-to-image generation by integrating multi-step logical reasoning, example-based editing, and deep domain knowledge into the creation workflow. These upgrades enable more accurate transformations, stronger structural correctness, and more reliable results in professional and technical scenarios. The model introduces example-based editing: instead of describing a complex change in words, users can provide a before/after reference pair and then a new image—the model infers the edit and applies the same transformation for tasks like material swaps, style transfers, and scene modifications. Seedream 5.0 lite also improves logical reasoning over spatial relationships, physics, and sequential processes, helping it place objects correctly, depict mechanisms faithfully, and illustrate multi-stage transformations with consistent details. In addition, its deep domain knowledge supports convention-aware outputs in fields such as architecture, science, health, and design, enabling workflows like turning rough floor plan sketches into photorealistic interior renders or producing labeled, accurate scientific diagrams.

ByteDance

~$0.02/image

-50%

Gemini 3.1 Pro Preview

251M Tokens

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, offering stronger software engineering performance, more dependable agent behavior, and more efficient token use across demanding workflows. Built on the multimodal foundation of the Gemini 3 series, it delivers high-accuracy reasoning across text, images, video, audio, and code, supported by a 1M-token context window. The 3.1 update brings clear improvements on SWE benchmarks and in real-world coding scenarios, along with more robust autonomous task execution in structured areas like finance and spreadsheet-driven workflows. Created for advanced development and agentic systems, Gemini 3.1 Pro Preview enhances long-horizon stability and tool coordination while further improving token efficiency. It also adds a new medium thinking mode to better balance cost, speed, and quality. The model shines in agentic coding, structured planning, multimodal analysis, and workflow automation—making it a strong fit for autonomous agents, financial modeling, spreadsheet automation, and other high-context enterprise tasks.

Google

$1.00/1M input tokens$6.00/1M output tokens

-50%

Claude Sonnet 4.6

539M Tokens

Sonnet 4.6 is Anthropic’s strongest Sonnet-class model to date, delivering frontier-level performance in coding, agentic tasks, and professional work. It shines in iterative development, navigating complex codebases, end-to-end project management with memory, creating polished documents, and reliable computer use for web QA and workflow automation.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

Qwen3.5 397B A17B

70.6M Tokens

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that combines a linear attention mechanism with a sparse mixture-of-experts approach, delivering improved inference efficiency. It achieves state-of-the-art results comparable to top-tier models across a broad range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interaction. Thanks to its strong code-generation and agent capabilities, the model demonstrates solid generalization across a wide variety of agents.

Qwen

$0.30/1M input tokens$1.30/1M output tokens

-50%

Claude Opus 4.6

500M Tokens

Opus 4.6 is Anthropic’s most advanced model for programming and handling extensive professional tasks. It’s designed for agents that manage entire workflows rather than isolated prompts, making it particularly effective for working with large codebases, implementing complex refactoring, and managing multi-stage debugging processes that evolve over time. Compared to previous versions, this model demonstrates deeper contextual awareness, better problem analysis, and increased reliability in challenging engineering scenarios. In addition to coding, Opus 4.6 excels at sustained knowledge-intensive work. It’s capable of generating near-production-ready documents, comprehensive plans, and thorough analyses in a single go, while maintaining coherence throughout lengthy outputs and extended sessions. This makes it an excellent default choice for tasks that demand persistence, sound judgment, and consistent execution—such as technical architecture, migration strategy planning, and end-to-end project management.

Anthropic

$2.50/1M input tokens$12.50/1M output tokens

-50%

Kimi K2.5

838M Tokens

Kimi K2.5 is Moonshot AI’s proprietary multimodal model, offering cutting-edge visual coding abilities and supporting a self-directed agent swarm approach. Developed from Kimi K2 and further trained on around 15 trillion mixed visual and text tokens, it achieves excellent results in general reasoning, visual coding, and autonomous tool invocation.

MoonShotAI

$0.30/1M input tokens$1.50/1M output tokens

-50%

Flux 2 Pro

1.03M Tokens

Ideal for high-quality image manipulation, style transfer, and sequential editing workflows

Black Forest Labs

~$0.01/image

-50%

Flux 2 Max

208K Tokens

FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.

Black Forest Labs

~$0.04/image

-50%

Flux 2 Flex

77K Tokens

FLUX.2 [flex] excels at rendering complex text, typography, and fine details, and supports multi-reference editing within the same unified architecture.

Black Forest Labs

~$0.03/image

-50%

Flux 2 Klein 4B

2.38M Tokens

FLUX.2 [klein] 4B is the quickest and most budget-friendly model in the FLUX.2 family, designed for high-throughput workloads while still delivering excellent image quality.

Black Forest Labs

~$0.007/image

-50%

GPT 5.2 Codex

502M Tokens

GPT-5.2-Codex is an enhanced version of GPT-5.1-Codex, optimized for software engineering and coding tasks. Designed for both interactive sessions and longer, independent execution, it excels at building projects, developing features, debugging, large-scale refactoring, and code review. Compared to its predecessor, 5.2-Codex follows developer instructions more closely and delivers cleaner, higher-quality code. It integrates seamlessly with developer tools—CLI, IDEs, GitHub, and cloud platforms—and adapts its reasoning effort based on task complexity, providing quick responses for simple tasks and maintaining extended performance for larger projects. The model supports structured code reviews, detects critical flaws, validates behavior against tests, and handles multimodal inputs like images for UI work. It is specifically tailored for agentic coding applications.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

Qwen Image Edit 2511

836K Tokens

Qwen-Image-Edit-2511 is the latest proprietary image editing model from Qwen, delivering substantial upgrades over its predecessor, Qwen-Image-Edit-2509. The new version features notable improvements in editing consistency, especially in multi-subject scenarios and character preservation, allowing for more faithful subject representation across edited images. Integrated support for popular community LoRAs now enables advanced lighting control and novel viewpoint generation natively. In addition, Qwen-Image-Edit-2511 offers enhanced industrial design capabilities, robust geometric reasoning for technical annotations, and improved fusion of multiple images. These advances result in more reliable, visually coherent, and creative image editing—making Qwen-Image-Edit-2511 a powerful and versatile tool for both imaginative and practical visual applications.

Qwen

~$0.01/image

-50%

Seedream 4.5

1.12B Tokens

Seedream 4.5 is the newest proprietary image generation model from ByteDance. Compared to Seedream 4.0, it offers substantial overall improvements—particularly in editing consistency, where it better maintains subject details, lighting, and color tones. The model also delivers enhanced portrait clarity and improved small-text rendering. Its ability to compose multiple images has been significantly upgraded, and advances in both inference performance and visual aesthetics allow for more accurate and artistically expressive image creation.

ByteDance

~$0.02/image

-50%

Gemini 3 Flash Preview

2.46B Tokens

Gemini 3 Flash Preview is a high-speed, cost-effective reasoning model built for agent-driven workflows, multi-turn conversation, and coding support. Offering near-Pro level performance in both reasoning and tool use, it stands out by delivering significantly lower latency than larger Gemini versions—making it ideal for interactive development, long-running agent loops, and collaborative programming. Compared to Gemini 2.5 Flash, it features notable improvements in reasoning ability, multimodal comprehension, and overall reliability. The model supports a 1M token context window and handles multimodal inputs—text, images, audio, video, and PDFs—with text-based output. Features like configurable reasoning levels, structured outputs, tool integration, and automatic context caching make it a strong choice for users seeking powerful agentic capabilities without the high cost or lag of more extensive models.

Google

$0.25/1M input tokens$1.50/1M output tokens

-50%

GPT Image 1.5

12.3M Tokens

GPT-Image-1.5 is the flagship image generation and editing model from OpenAI, designed for precise, natural, and fast creation. It reliably follows user instructions down to fine details, preserving critical elements like lighting, composition, and facial likeness across edits and generations. GPT-Image-1.5 excels at a wide range of editing tasks—including addition, removal, stylization, combination, and advanced text rendering—producing images that closely match user intent. With up to 4× faster generation speeds compared to previous versions, it streamlines creative workflows, enabling quick iterations whether you need a simple fix or a total visual transformation. Enhanced integration and lower API costs make GPT-Image-1.5 ideal for marketing, product visualization, ecommerce, and creative tools scenarios, while its dedicated editor and presets provide a delightful, accessible creative space for both practical and expressive image work.

OpenAI

~$0.03/image

-50%

GPT 5.2 Chat

80.2M Tokens

GPT-5.2 Chat (also known as Instant) is the fast and lightweight version of the 5.2 family, built for low-latency chatting while maintaining strong general intelligence. It leverages adaptive reasoning to focus more “thinking” on challenging queries, boosting accuracy in math, coding, and multi-step tasks without sacrificing speed in everyday conversations. The model is naturally warmer and more conversational, with improved instruction following and more stable short-form reasoning. GPT-5.2 Chat is ideal for high-throughput, interactive scenarios where quick response and consistency are more important than in-depth analysis.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

GPT 5.2 Pro

15.2M Tokens

GPT-5.2 Pro is OpenAI's most advanced model, featuring significant upgrades in agentic coding and long-context capabilities compared to GPT-5 Pro. It is specifically optimized for handling complex tasks that demand step-by-step reasoning, precise instruction following, and accuracy in critical scenarios. The model supports advanced test-time routing and sophisticated prompt understanding, including user cues like "think hard about this." Key improvements include reduced hallucination and sycophancy, along with stronger performance in coding, writing, and health-related tasks.

OpenAI

$11.00/1M input tokens$84.00/1M output tokens

-50%

GPT-5.2

574M Tokens

GPT-5.2 is the newest frontier-level model in the GPT-5 line, providing enhanced agentic abilities and better long-context performance than GPT-5.1. It employs adaptive reasoning to dynamically distribute computational resources, enabling quick responses to simple requests and deeper analysis for complex challenges. Designed for wide-ranging tasks, GPT-5.2 offers steady improvements in mathematics, programming, science, and tool usage, delivering more coherent long-form responses and increased reliability when using tools.

OpenAI

$0.88/1M input tokens$7.00/1M output tokens

-50%

GPT 5.1 Codex Max

127M Tokens

GPT-5.1-Codex-Max is OpenAI’s newest agentic coding model, created for extended, high-context software development tasks. Built on an enhanced 5.1 reasoning stack, it’s been trained with agentic workflows covering software engineering, mathematics, and research. GPT-5.1-Codex-Max offers faster performance, better reasoning abilities, and increased token efficiency throughout the development process.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

Mistral Large 12-2025

52.2M Tokens

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total). Released under the Apache 2.0 license.

MistralAI

$0.25/1M input tokens$0.75/1M output tokens

-50%

Claude Opus 4.5

518M Tokens

Claude Opus 4.5 is Anthropic’s latest reasoning model, developed for advanced software engineering, complex agent workflows, and extended computer tasks. It features strong multimodal capabilities and performs competitively on real-world coding and reasoning benchmarks, with enhanced resilience against prompt injection. Optimized for efficiency at varying effort levels, it allows developers to balance speed, depth, and token usage according to their specific needs, thanks to a new parameter for controlling token efficiency. Opus 4.5 excels in advanced tool integration, contextual management, and multi-agent coordination, making it ideal for autonomous research, debugging, complex planning, and spreadsheet or browser manipulation. Compared to previous Opus generations, it delivers significant improvements in structured reasoning, long-duration task performance, execution reliability, and alignment, all while reducing token overhead.

Anthropic

$2.50/1M input tokens$12.50/1M output tokens

-50%

Gemini 3 Pro Image Preview (Nano Banana Pro)

95.1M Tokens

Gemini 3 Pro Image Preview (Nano Banana Pro) is Google’s most advanced image generation and editing model, built on Gemini 3 Pro. Building on the original Nano Banana, it offers much improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model produces context-rich visuals—from infographics and diagrams to cinematic composites—and can incorporate up-to-the-minute information through Search grounding. It leads the industry with sophisticated text rendering in images, handles consistent multi-image blending, and maintains accurate identity preservation for up to five subjects. Nano Banana Pro gives users fine-grained creative controls like localized edits, lighting and focus adjustments, camera transformations, 2K/4K output, and flexible aspect ratios. Tailored for professional design, product visualization, storyboarding, and complex compositions, it remains efficient for everyday image creation needs.

Google

$1.00/1M input tokens$6.00/1M output tokens

-50%

Grok 4.1 Fast Reasoning

1.52B Tokens

Grok 4.1 Fast Reasoning is xAI's most capable tool-calling model, engineered for production-grade agentic applications with a 2M token context window. Achieving state-of-the-art results on Berkeley Function Calling v4 and leading agentic search benchmarks like Research-Eval Reka (63.9) and FRAMES (87.6), it excels at multi-turn conversations, long-horizon planning, and autonomous task execution. Built through RL training in real-world simulated environments, Grok 4.1 Fast Reasoning delivers exceptional performance on complex enterprise scenarios like customer support and finance while cutting hallucination rates in half compared to its predecessor.

xAI

$0.10/1M input tokens$0.25/1M output tokens

-50%

Grok 4.1 Fast Non-Reasoning

57.4M Tokens

Grok 4.1 Fast Non-Reasoning is xAI's high-speed variant optimized for instant responses and straightforward queries, featuring a 2M token context window. Designed for production workflows requiring rapid inference without deep reasoning overhead, it maintains strong tool-calling capabilities and multi-turn consistency while delivering faster response times. Ideal for real-time applications, customer-facing chatbots, and scenarios where speed is critical, Grok 4.1 Fast Non-Reasoning balances performance with cost-effectiveness for efficient, production-ready agent deployments.

xAI

$0.10/1M input tokens$0.25/1M output tokens

-50%

Gemini 3 Pro Preview

630M Tokens

Gemini 3 Pro Preview is Google’s most advanced AI model, setting new records on leading benchmarks like LMArena (1501 Elo), GPQA Diamond (91.9%), and MathArena Apex (23.4%). With powerful reasoning and deep multimodal understanding across text, images, code, video, and audio, Gemini 3 Pro Preview delivers nuanced, context-aware responses and excels at complex problem-solving, scientific analysis, and creative coding tasks.

Google

$1.00/1M input tokens$6.00/1M output tokens

-50%

GPT 5.1 Codex Mini

3.94B Tokens

GPT-5.1-Codex-Mini is a more compact and faster variant of GPT-5.1-Codex.

OpenAI

$0.13/1M input tokens$1.00/1M output tokens

-50%

GPT 5.1 Codex

27.1M Tokens

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It's designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, closely follows developer instructions, and produces cleaner, higher-quality code. Codex integrates into developer environments like the CLI, IDE extensions, GitHub, and cloud tasks. It adapts its reasoning dynamically—providing quick answers for small tasks and sustaining long, multi-hour runs for large projects. The model is trained for structured code reviews, identifying critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs like images or screenshots for UI development and integrates tools for search, dependency installation, and environment setup. Codex is specifically intended for agentic coding applications.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT 5.1 Chat

8.45M Tokens

GPT-5.1 Chat (also known as Instant) is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT 5.1

142M Tokens

GPT-5.1 is the newest top-tier model in the GPT-5 series, featuring enhanced general reasoning, better instruction following, and a more natural conversational tone compared to GPT-5. With adaptive reasoning, it dynamically adjusts its computational effort—responding swiftly to simple queries and diving deeper into complex tasks. Explanations are now clearer and use less jargon, making challenging topics easier to grasp. Designed for a wide range of tasks, GPT-5.1 consistently improves performance in math, coding, and structured analysis, offering more cohesive long-form responses and more reliable tool usage. Its conversation style is warmer and more intuitive, yet still precise. GPT-5.1 stands as the main, fully capable successor to GPT-5.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

Seedream 4

244M Tokens

Seedream 4.0 is ByteDance’s advanced text-to-image and image editing model, designed for high-speed, high-resolution image generation and robust contextual understanding. It unifies generation and editing in a single architecture, supports complex visual tasks with natural-language instructions, and excels at multi-reference batches and diverse style transfers. Seedream 4.0 stands out for its ability to handle both content creation and modification, offering creative professionals and enterprises an all-in-one, efficient solution for imaginative and knowledge-driven visual tasks.

ByteDance

~$0.01/image

-50%

Claude Haiku 4.5

513M Tokens

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, offering near-frontier intelligence with much lower cost and latency than larger Claude models. It matches Claude Sonnet 4’s performance in reasoning, coding, and computer-use tasks, making it ideal for real-time and large-scale applications. Haiku 4.5 introduces controllable reasoning depth, supports summarized or interleaved thought outputs, and enables tool-assisted workflows across coding, bash, web search, and computer-use tools. With over 73% on SWE-bench Verified, it stands among the top coding models while maintaining fast responsiveness for sub-agents, parallel execution, and scaled deployment.

Anthropic

$0.50/1M input tokens$2.50/1M output tokens

-50%

Gemini 2.5 Flash Image

32.5M Tokens

Gemini 2.5 Flash Image, also known as "Nano Banana" is a state-of-the-art image generation model with strong contextual understanding. It supports image generation, editing, and multi-turn conversational interactions.

Google

$0.15/1M input tokens$1.25/1M output tokens

-50%

Mistral Moderation 2411

Mistral Moderation 2411 is a content moderation model from Mistral, offering high-accuracy text moderation across nine safety categories and multiple languages. It is designed for robust, real-time moderation in diverse environments.

MistralAI

$0.05/1M tokens

-50%

Nova Pro v1

117K Tokens

Amazon Nova Pro 1.0 is a versatile multimodal model from Amazon, designed to balance accuracy, speed, and cost across a wide range of tasks. As of December 2024, it delivers state-of-the-art results on key benchmarks like visual question answering (TextVQA) and financial document analysis. The model excels in processing both visual and textual information, though video input is currently not supported.

Amazon

$0.40/1M input tokens$1.60/1M output tokens

-50%

Nova Lite v1

341K Tokens

Amazon Nova Lite 1.0 is a low-cost multimodal model from Amazon, designed for fast processing of image, video, and text inputs to generate text output. It handles real-time customer interactions, document analysis, and visual question answering with high accuracy. With a 300K token input context, it can process multiple images or up to 30 minutes of video in a single input.

Amazon

$0.03/1M input tokens$0.12/1M output tokens

-50%

Mistral Small 2506

47.1M Tokens

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral, optimized for instruction following, repetition reduction, and improved function calling. It supports both image and text inputs, delivers strong performance across coding, STEM, and vision benchmarks, and is designed for efficient, structured output generation.

MistralAI

$0.05/1M input tokens$0.15/1M output tokens

-50%

Claude Sonnet 4.5

1.23B Tokens

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model, optimized for real-world agents and coding workflows. It achieves state-of-the-art results on coding benchmarks like SWE-bench Verified, with notable improvements in system design, code security, and following specifications. Designed for extended autonomous operation, the model maintains task continuity across sessions and offers fact-based progress tracking. Sonnet 4.5 features enhanced agentic abilities, such as improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With better context tracking and awareness of token usage across tool calls, it excels in multi-context and long-running workflows. Key use cases include software engineering, cybersecurity, financial analysis, research agents, and other areas requiring sustained reasoning and tool use.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

Gemini 2.5 Flash Lite Preview 09-2025

130M Tokens

Gemini 2.5 Flash-Lite Preview September 2025 Checkpoint is a lightweight, high-throughput model from the Gemini 2.5 family, focused on ultra-low latency and cost efficiency. It delivers even faster token generation, concise output, and improved performance on standard benchmarks compared to earlier Flash-Lite models, making it ideal for large-scale, real-time applications.

Google

$0.05/1M input tokens$0.20/1M output tokens

-50%

Gemini 2.5 Flash Preview 09-2025

183M Tokens

Gemini 2.5 Flash Preview September 2025 Checkpoint is Google’s high-performance model, built for advanced reasoning, code generation, mathematical tasks, and scientific applications. This version introduces faster, more efficient output and smarter tool use for complex, multi-step workflows.

Google

$0.15/1M input tokens$1.25/1M output tokens

-50%

Qwen3 VL 235B A22B Thinking

1.99M Tokens

Qwen3-VL-235B-A22B Thinking is a multimodal model that combines advanced text generation with visual understanding for images and video, specifically optimized for multimodal reasoning in STEM and math. It delivers robust perception, strong spatial (2D/3D) understanding, and long-form visual comprehension, showing competitive performance in public benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction, tool use, following complex instructions in multi-image dialogues, aligning text with video timelines, and automating GUI operations. The model also enables visual coding workflows, such as turning sketches into code and assisting with UI debugging, while maintaining strong text-only capabilities on par with Qwen3 language models. This makes it ideal for use cases like document AI, multilingual OCR, UI/software help, spatial reasoning, and vision-language agent research.

Qwen

$0.35/1M input tokens$4.20/1M output tokens

-50%

Qwen3 VL 235B A22B Instruct

1.28M Tokens

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that combines strong text generation with advanced visual understanding for images and video. Designed for general vision-language tasks like VQA, document parsing, chart and table extraction, and multilingual OCR, the model emphasizes robust perception, spatial (2D/3D) understanding, and long-form visual comprehension, with competitive results on public benchmarks. Qwen3-VL also supports agentic interaction and tool use, following complex instructions in multi-image dialogues, aligning text to video timelines, operating GUIs for automation, and enabling visual coding workflows such as turning sketches into code or debugging UIs. Its strong text-only capabilities match Qwen3 language models, making it suitable for document AI, OCR, UI/software assistance, spatial reasoning, and vision-language agent research.

Qwen

$0.35/1M input tokens$1.40/1M output tokens

-50%

Qwen Image Edit 2509

254K Tokens

Qwen-Image-Edit-2509 is the latest iteration of the Qwen-Image-Edit model, released in September. It introduces multi-image editing capabilities by building on the original architecture and further training with image concatenation, supporting combinations like “person + person,” “person + product,” and “person + scene,” with optimal performance for 1 to 3 images. For single-image editing, Qwen-Image-Edit-2509 delivers improved consistency, particularly in person editing (better facial identity preservation and support for various portrait styles), product editing (enhanced product identity retention), and text editing (support for modifying fonts, colors, and materials in addition to content). The model also natively supports ControlNet features, such as depth maps, edge maps, and keypoint maps.

Qwen

~$0.01/image

-50%

GPT-5 Codex

39.9M Tokens

GPT-5-Codex is a specialized version of GPT-5 tailored for software engineering and coding tasks. It is suitable for both interactive development sessions and the independent execution of complex engineering projects. The model is capable of building projects from scratch, developing new features, debugging, performing large-scale refactoring, and conducting code reviews. Compared to the standard GPT-5, Codex offers greater steerability, follows developer instructions more closely, and delivers cleaner, higher-quality code.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

Grok 4 Fast Non-Reasoning

24.4M Tokens

A blazing fast variant for instant, cost-effective answers without reasoning traces. Built on the same Grok 4 Fast backbone for unified quality and efficiency, it excels at search, summarization, Q&A, and lightweight agent use. Delivers low latency, reduced token cost, and supports the 2M token context for long inputs. Perfect for rapid and scalable information workflows.

xAI

$0.10/1M input tokens$0.25/1M output tokens

-50%

Grok 4 Fast Reasoning

424M Tokens

State-of-the-art reasoning model optimized for cost-efficient, high-quality chain-of-thought. Trained end-to-end with tool use and agentic search, it matches top-tier benchmarks like AIME, HMMT, and GPQA at 40% lower token use versus Grok 4. Features a huge 2M token context and native web/X browsing. Ideal for agentic workflows, research, code, logic, and complex multi-step tasks. Offers up to 98% cheaper reasoning versus previous models.

xAI

$0.10/1M input tokens$0.25/1M output tokens

-50%

Gemini 2.5 Flash Image Preview

30.3M Tokens

Gemini 2.5 Flash Image Preview is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations.

Google

$0.15/1M input tokens$1.25/1M output tokens

-50%

Llama 3.2 11B Vision Instruct

6.16K Tokens

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed for tasks combining visual and textual data. It excels at image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it is ideal for content creation, AI-driven customer service, and research.

Meta Llama

$0.10/1M input tokens$0.10/1M output tokens

-50%

GPT-5 Mini (Free)

1.41B Tokens

A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.

OpenAI

Free

GPT-5 Chat Latest

36.6M Tokens

GPT-5 Chat is tailored for advanced, natural, and context-aware conversations in enterprise environments. It leverages the latest advancements in OpenAI’s conversational AI, supporting multimodal and dynamic dialogue with enhanced context retention and user intent understanding.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT-5 Nano

223M Tokens

The smallest and fastest member of the GPT-5 family, optimized for developer tools, rapid user interactions, and ultra-low latency environments. While it offers limited reasoning depth compared to larger models, GPT-5-Nano preserves essential instruction-following and safety mechanisms. It is the successor to GPT-4.1-nano and is best suited for real-time, cost-sensitive, or embedded applications.

OpenAI

$0.02/1M input tokens$0.20/1M output tokens

-50%

GPT-5 Mini

1.15B Tokens

OpenAI

$0.13/1M input tokens$1.00/1M output tokens

-50%

GPT-5

379M Tokens

OpenAI’s most advanced large language model, engineered for high-stakes applications requiring step-by-step reasoning, precise instruction following, and robust code generation. GPT-5 introduces major improvements in factual accuracy, user intent understanding, and hallucination reduction. It supports advanced prompt routing, user-specified intent (such as "think hard about this"), and is optimized for complex workflows in coding, writing, and health-related domains.

OpenAI

$0.63/1M input tokens$5.00/1M output tokens

-50%

Claude Opus 4.1

33.9M Tokens

Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves high scores on SWE-bench Verified and excels in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for research, data analysis, and tool-assisted workflows.

Anthropic

$10.00/1M input tokens$50.00/1M output tokens

-50%

Qwen Image

883K Tokens

Qwen-Image is a foundation image generation model from the Qwen team, excelling at high-fidelity text rendering, complex text integration (including English and Chinese), and diverse artistic styles. It supports advanced editing features such as style transfer, object manipulation, and human pose editing, and is suitable for both image generation and understanding tasks.

Qwen

~$0.01/image

-50%

Gemini 2.5 Flash Lite

1.67M Tokens

Gemini 2.5 Flash-Lite is a streamlined reasoning model from the Gemini 2.5 family, designed for extremely low latency and cost-effectiveness. It delivers higher throughput, quicker token generation, and enhanced performance on standard benchmarks compared to previous Flash models.

Google

$0.05/1M input tokens$0.20/1M output tokens

-50%

Flux 1 Kontext Max

516K Tokens

Flux-1-Kontext-Max is a premium text-based image editing model from Black Forest Labs, delivering maximum performance and advanced typography generation for transforming images through natural language prompts. It is designed for high-end creative and professional use.

Black Forest Labs

~$0.04/image

-50%

Flux 1 Kontext Pro

4.64M Tokens

Flux-1-Kontext-Pro is a state-of-the-art text-based image editing model from Black Forest Labs, providing high-quality, prompt-adherent output for transforming images using natural language. It is optimized for consistent results and advanced editing tasks.

Black Forest Labs

~$0.02/image

-50%

Gemini 2.5 Flash (Free)

16.6B Tokens

Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.

Google

Free

Grok 4

73.8M Tokens

Grok 4 is xAI’s latest reasoning model, featuring a 256k context window and support for parallel tool calling, structured outputs, and both image and text inputs. Designed for high-throughput, complex reasoning tasks, with pricing that scales for large token requests.

xAI

$2.25/1M input tokens$11.25/1M output tokens

-50%

Gemini 2.0 Flash Lite

21.2M Tokens

Gemini 2.0 Flash Lite is optimized for extremely fast response times and low cost, while maintaining the quality of larger models. Ideal for real-time and large-scale applications.

Google

$0.04/1M input tokens$0.15/1M output tokens

-50%

Gemini 2.5 Flash

902M Tokens

Google

$0.15/1M input tokens$1.25/1M output tokens

-50%

Gemini 2.5 Pro

970M Tokens

Gemini 2.5 Pro is Google’s state-of-the-art AI model, designed for advanced reasoning, coding, mathematics, and scientific tasks. Employs “thinking” capabilities for nuanced context handling and achieves top-tier performance on multiple benchmarks, including first-place on the LMArena leaderboard.

Google

$0.63/1M input tokens$5.00/1M output tokens

-50%

O1

72.5K Tokens

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. Trained with large-scale reinforcement learning for chain-of-thought reasoning, it is optimized for math, science, programming, and other STEM tasks, consistently achieving PhD-level accuracy on industry benchmarks.

OpenAI

$7.50/1M input tokens$30.00/1M output tokens

-50%

Gemma 3 27B IT

145K Tokens

Google’s latest open-source multimodal model, Gemma 3 27B, supports vision-language input and text outputs, handles context windows up to 128k tokens, and understands over 140 languages. Offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.

Google

$0.05/1M input tokens$0.10/1M output tokens

-50%

Llama 4 Scout 17B 16E Instruct

2.02M Tokens

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model from Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding.

Meta Llama

$0.24/1M input tokens$0.96/1M output tokens

-50%

GPT-4o 2024-05-13

133K Tokens

GPT-4o (“o” for “omni”) is OpenAI’s latest multimodal model, supporting both text and image inputs with text outputs. Delivers improved performance in non-English languages and visual understanding, while being faster and more cost-effective than previous models.

OpenAI

$2.50/1M input tokens$7.50/1M output tokens

-50%

GPT-4 Turbo

99.6K Tokens

The latest GPT-4 Turbo model with vision capabilities, supporting JSON mode and function calling. Trained on data up to December 2023, it is optimized for high-throughput, multimodal applications.

OpenAI

$5.00/1M input tokens$15.00/1M output tokens

-50%

GPT-4o 2024-08-06

127K Tokens

The August 2024 version of GPT-4o, offering improved structured output capabilities, including support for JSON schema in responses. Maintains high intelligence and efficiency, with enhanced non-English and visual performance.

OpenAI

$1.25/1M input tokens$5.00/1M output tokens

-50%

Llama 4 Maverick 17B 128E Instruct

499K Tokens

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Features early fusion for native multimodality and a 1 million token context window.

Meta Llama

$0.28/1M input tokens$1.10/1M output tokens

-50%

Omni Moderation

43.9K Tokens

Omni-Moderation is OpenAI’s newest multimodal content moderation model, available through the Moderation API. It is designed to identify potentially harmful content in both text and images, offering improved accuracy and granular control, especially in non-English languages.

OpenAI

Free

Claude 3.5 Sonnet

8.85M Tokens

Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

Codex Mini

882K Tokens

A fine-tuned version of o4-mini, specifically optimized for use in Codex CLI. Recommended for code-related tasks, with improved performance in code generation and completion.

OpenAI

$0.75/1M input tokens$3.00/1M output tokens

-50%

Pixtral Large 2411

40K Tokens

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. It is capable of understanding documents, charts, and natural images, and is available under both research and commercial licenses. The model is designed for advanced document and image analysis tasks.

MistralAI

$1.00/1M input tokens$3.00/1M output tokens

-50%

O3

25.3M Tokens

A well-rounded, powerful model from OpenAI, setting new standards in math, science, coding, and visual reasoning. Excels at technical writing and instruction-following, and is designed for multi-step problem solving across text, code, and images. BYOK is required for access.

OpenAI

$1.00/1M input tokens$4.00/1M output tokens

-50%

Grok 2 Vision

13.6K Tokens

xAI’s Grok 2 Vision 1212 is a next-generation vision-language model designed for advanced image-based AI applications. It features robust visual comprehension, refined instruction-following, and strong multilingual support. The model excels at object recognition, style analysis, and visual reasoning, empowering developers to build intuitive, visually aware applications. Enhanced steerability and reasoning capabilities make it a solid foundation for next-generation image solutions. For more details, see the official [xAI announcement](https://x.ai/blog/grok-1212).

xAI

$1.00/1M input tokens$5.00/1M output tokens

-50%

GPT-4o Search Preview

168K Tokens

Specialized GPT-4o variant trained for web search understanding and execution within chat completions, enabling advanced search query comprehension.

OpenAI

$1.25/1M input tokens$5.00/1M output tokens

-50%

Claude 3.5 Haiku

5.99M Tokens

Anthropic

$0.40/1M input tokens$2.00/1M output tokens

-50%

O4 Mini

45.4M Tokens

A compact reasoning model in OpenAI’s o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. Supports tool use and demonstrates competitive reasoning and coding performance across benchmarks, outperforming its predecessor o3-mini and approaching o3 in some domains. Well-suited for high-throughput scenarios where latency or cost is critical.

OpenAI

$0.55/1M input tokens$2.20/1M output tokens

-50%

Llama 4 Scout 17B 16E Instruct (Free)

116M Tokens

Meta Llama

Free

Claude Opus 4

35.3M Tokens

Anthropic

$10.00/1M input tokens$50.00/1M output tokens

-50%

ChatGPT-4o Latest

13M Tokens

The continually updated version of OpenAI ChatGPT 4o, always pointing to the current GPT-4o model used by ChatGPT. Incorporates additional RLHF and may differ from the API version. Intended for research and evaluation, not recommended for production as it may be redirected or removed in the future.

OpenAI

$2.50/1M input tokens$7.50/1M output tokens

-50%

GPT-4.1 Nano

13.9M Tokens

The fastest and most cost-effective model in the GPT-4.1 series, designed for tasks demanding low latency such as classification and autocompletion. Maintains a 1 million token context window and delivers exceptional performance at a small size, outperforming even some larger models on key benchmarks.

OpenAI

$0.05/1M input tokens$0.20/1M output tokens

-50%

GPT-4.1 Mini

190M Tokens

A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.

OpenAI

$0.20/1M input tokens$0.80/1M output tokens

-50%

GPT-4.1 Mini (Free)

253M Tokens

OpenAI

Free

Mistral Medium 2505

57.5K Tokens

Mistral Medium 3 is a high-performance, enterprise-grade language model that balances state-of-the-art reasoning and multimodal capabilities with significantly reduced operational cost. It excels in coding, STEM reasoning, and enterprise adaptation, and is optimized for scalable deployments across professional and industrial use cases, including hybrid and on-prem environments.

MistralAI

$0.20/1M input tokens$1.00/1M output tokens

-50%

Claude Sonnet 4

224M Tokens

Claude Sonnet 4 is a next-generation model from Anthropic, significantly enhancing coding and reasoning capabilities over its predecessor. It achieves state-of-the-art performance on SWE-bench, balances capability and computational efficiency, and is optimized for both routine and complex software development projects. Key features include improved codebase navigation, reduced error rates, and increased reliability in following intricate instructions.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

Llama Guard 4 12B

103K Tokens

Llama Guard 4 is a multimodal content safety classifier derived from Llama 4 Scout, fine-tuned for both prompt and response classification. It supports content moderation for English and multiple languages, including mixed text-and-image prompts. The model is aligned with the MLCommons hazards taxonomy and is integrated into the Llama Moderations API for robust safety classification in text and images.

Meta Llama

$0.02/1M input tokens$0.02/1M output tokens

-50%

Mistral Small 2503

202K Tokens

Mistral-Small-3.2-24B-Instruct-2503 is an updated 24B parameter model from Mistral, optimized for instruction following, repetition reduction, and improved function calling. It supports both image and text inputs, delivers strong performance across coding, STEM, and vision benchmarks, and is designed for efficient, structured output generation.

MistralAI

$0.05/1M input tokens$0.15/1M output tokens

-50%

GPT Image 1

3.1M Tokens

OpenAI’s new state-of-the-art image generation model. This is a natively multimodal language model that accepts both text and image inputs and produces image outputs. It powers image generation in ChatGPT, offering exceptional prompt adherence, a high level of detail, and quality.

OpenAI

~$0.04/image

-50%

Claude 3.7 Sonnet

10.9M Tokens

Claude 3.7 Sonnet is an advanced large language model from Anthropic, featuring improved reasoning, coding, and problem-solving abilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model excels in agentic workflows, front-end development, and full-stack updates, and offers an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following.

Anthropic

$1.50/1M input tokens$7.50/1M output tokens

-50%

GPT-4o Mini

461M Tokens

OpenAI’s most advanced small model, GPT-4o mini, supports both text and image inputs with text outputs. It is highly cost-effective, achieving SOTA intelligence and outperforming larger models on key benchmarks, making it ideal for scalable, interactive applications.

OpenAI

$0.07/1M input tokens$0.30/1M output tokens

-50%

GPT-4o

131M Tokens

The November 2024 release of GPT-4o, featuring enhanced creative writing, more natural and engaging responses, and improved file handling. Maintains the intelligence of GPT-4 Turbo while being twice as fast and 50% more cost-effective, with better support for non-English languages and visual tasks.

OpenAI

$1.25/1M input tokens$5.00/1M output tokens

-50%

Gemini 2.0 Flash

367M Tokens

Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.

Google

$0.05/1M input tokens$0.20/1M output tokens

-50%

GPT-4.1

141M Tokens

A flagship large language model from OpenAI, optimized for advanced instruction following, real-world software engineering, and long-context reasoning. Supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 in coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding. Tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

OpenAI

$1.00/1M input tokens$4.00/1M output tokens

-50%