Models

Explore AI models available through NagaAI.

Sort by

GPT 5.4 Pro

3.23M Tokens

GPT-5.4 Pro is OpenAI’s most capable model, extending GPT-5.4’s unified architecture with stronger reasoning for complex, high-stakes tasks. It includes a 1M+ token context window (922K input, 128K output) and supports both text and image inputs. Tuned for step-by-step reasoning, instruction following, and high accuracy, GPT-5.4 Pro stands out in agentic coding, long-context workflows, and multi-step problem solving.

by

$15.00/1M input tokens$90.00/1M output tokens

-50%

GPT 5.4

500M Tokens

GPT-5.4 is OpenAI’s newest frontier model that merges the Codex and GPT families into a single unified system. It offers a 1M+ token context window (922K input, 128K output) and supports both text and image inputs, enabling high-context reasoning, coding, and multimodal analysis in one workflow. The model brings stronger performance in coding, document understanding, tool usage, and instruction following. It’s built to be a solid default for both general tasks and software engineering—able to produce production-ready code, synthesize information from multiple sources, and handle complex multi-step workflows with fewer iterations and better token efficiency.

by

$1.25/1M input tokens$7.50/1M output tokens

-50%

GPT-5.3 Chat

7.99M Tokens

GPT-5.3 Chat is an updated version of ChatGPT’s most widely used model, designed to make everyday conversations smoother, more practical, and more directly helpful. It provides more accurate answers with stronger context awareness and noticeably cuts down on unnecessary refusals, excessive caveats, and overly cautious wording that can disrupt the flow of conversation.

by

$0.88/1M input tokens$7.00/1M output tokens

-50%

Gemini 3.1 Flash Lite Preview

44.6M Tokens

Gemini 3.1 Flash Lite Preview is Google’s high-efficiency model designed for high-throughput, high-volume use cases. It delivers better overall quality than Gemini 2.5 Flash Lite and comes close to Gemini 2.5 Flash performance across core capabilities. Enhancements include audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. It supports the full range of thinking levels (minimal, low, medium, high) to enable fine-grained cost/performance tuning. Pricing is set at half the cost of Gemini 3 Flash.

by

$0.13/1M input tokens$0.75/1M output tokens

-50%

Nano Banana 2 (Gemini 3.1 Flash Image Preview)

31.2M Tokens

Nano Banana 2 (Gemini 3.1 Flash Image) is Google DeepMind’s flagship Flash image model for high-fidelity generation and fast, advanced editing at scale, optimized for price–performance. It follows complex prompts more reliably and adds configurable thinking levels (Minimal vs High/Dynamic) to balance latency and quality. Nano Banana 2 improves in-image text rendering and supports in-image localization (generate/translate text across languages directly in the image), while leveraging stronger world knowledge and web image search for more grounded, realistic outputs. It supports native aspect ratios (including 4:1, 1:4, 8:1, 1:8) and 512px/1K/2K/4K resolutions.

by

$0.13/1M input tokens$0.75/1M output tokens

-50%

Gemini 3.1 Pro Preview

252M Tokens

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, offering stronger software engineering performance, more dependable agent behavior, and more efficient token use across demanding workflows. Built on the multimodal foundation of the Gemini 3 series, it delivers high-accuracy reasoning across text, images, video, audio, and code, supported by a 1M-token context window. The 3.1 update brings clear improvements on SWE benchmarks and in real-world coding scenarios, along with more robust autonomous task execution in structured areas like finance and spreadsheet-driven workflows. Created for advanced development and agentic systems, Gemini 3.1 Pro Preview enhances long-horizon stability and tool coordination while further improving token efficiency. It also adds a new medium thinking mode to better balance cost, speed, and quality. The model shines in agentic coding, structured planning, multimodal analysis, and workflow automation—making it a strong fit for autonomous agents, financial modeling, spreadsheet automation, and other high-context enterprise tasks.

by

$1.00/1M input tokens$6.00/1M output tokens

-50%

Gemini 3 Flash Preview

2.47B Tokens

Gemini 3 Flash Preview is a high-speed, cost-effective reasoning model built for agent-driven workflows, multi-turn conversation, and coding support. Offering near-Pro level performance in both reasoning and tool use, it stands out by delivering significantly lower latency than larger Gemini versions—making it ideal for interactive development, long-running agent loops, and collaborative programming. Compared to Gemini 2.5 Flash, it features notable improvements in reasoning ability, multimodal comprehension, and overall reliability. The model supports a 1M token context window and handles multimodal inputs—text, images, audio, video, and PDFs—with text-based output. Features like configurable reasoning levels, structured outputs, tool integration, and automatic context caching make it a strong choice for users seeking powerful agentic capabilities without the high cost or lag of more extensive models.

by

$0.25/1M input tokens$1.50/1M output tokens

-50%

GPT 5.2 Chat

80.3M Tokens

GPT-5.2 Chat (also known as Instant) is the fast and lightweight version of the 5.2 family, built for low-latency chatting while maintaining strong general intelligence. It leverages adaptive reasoning to focus more “thinking” on challenging queries, boosting accuracy in math, coding, and multi-step tasks without sacrificing speed in everyday conversations. The model is naturally warmer and more conversational, with improved instruction following and more stable short-form reasoning. GPT-5.2 Chat is ideal for high-throughput, interactive scenarios where quick response and consistency are more important than in-depth analysis.

by

$0.88/1M input tokens$7.00/1M output tokens

-50%

GPT 5.2 Pro

15.2M Tokens

GPT-5.2 Pro is OpenAI's most advanced model, featuring significant upgrades in agentic coding and long-context capabilities compared to GPT-5 Pro. It is specifically optimized for handling complex tasks that demand step-by-step reasoning, precise instruction following, and accuracy in critical scenarios. The model supports advanced test-time routing and sophisticated prompt understanding, including user cues like "think hard about this." Key improvements include reduced hallucination and sycophancy, along with stronger performance in coding, writing, and health-related tasks.

by

$11.00/1M input tokens$84.00/1M output tokens

-50%

GPT-5.2

574M Tokens

GPT-5.2 is the newest frontier-level model in the GPT-5 line, providing enhanced agentic abilities and better long-context performance than GPT-5.1. It employs adaptive reasoning to dynamically distribute computational resources, enabling quick responses to simple requests and deeper analysis for complex challenges. Designed for wide-ranging tasks, GPT-5.2 offers steady improvements in mathematics, programming, science, and tool usage, delivering more coherent long-form responses and increased reliability when using tools.

by

$0.88/1M input tokens$7.00/1M output tokens

-50%

Claude Opus 4.5

518M Tokens

Claude Opus 4.5 is Anthropic’s latest reasoning model, developed for advanced software engineering, complex agent workflows, and extended computer tasks. It features strong multimodal capabilities and performs competitively on real-world coding and reasoning benchmarks, with enhanced resilience against prompt injection. Optimized for efficiency at varying effort levels, it allows developers to balance speed, depth, and token usage according to their specific needs, thanks to a new parameter for controlling token efficiency. Opus 4.5 excels in advanced tool integration, contextual management, and multi-agent coordination, making it ideal for autonomous research, debugging, complex planning, and spreadsheet or browser manipulation. Compared to previous Opus generations, it delivers significant improvements in structured reasoning, long-duration task performance, execution reliability, and alignment, all while reducing token overhead.

by

$2.50/1M input tokens$12.50/1M output tokens

-50%

Gemini 3 Pro Image Preview (Nano Banana Pro)

95.1M Tokens

Gemini 3 Pro Image Preview (Nano Banana Pro) is Google’s most advanced image generation and editing model, built on Gemini 3 Pro. Building on the original Nano Banana, it offers much improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model produces context-rich visuals—from infographics and diagrams to cinematic composites—and can incorporate up-to-the-minute information through Search grounding. It leads the industry with sophisticated text rendering in images, handles consistent multi-image blending, and maintains accurate identity preservation for up to five subjects. Nano Banana Pro gives users fine-grained creative controls like localized edits, lighting and focus adjustments, camera transformations, 2K/4K output, and flexible aspect ratios. Tailored for professional design, product visualization, storyboarding, and complex compositions, it remains efficient for everyday image creation needs.

by

$1.00/1M input tokens$6.00/1M output tokens

-50%

Gemini 3 Pro Preview

631M Tokens

Gemini 3 Pro Preview is Google’s most advanced AI model, setting new records on leading benchmarks like LMArena (1501 Elo), GPQA Diamond (91.9%), and MathArena Apex (23.4%). With powerful reasoning and deep multimodal understanding across text, images, code, video, and audio, Gemini 3 Pro Preview delivers nuanced, context-aware responses and excels at complex problem-solving, scientific analysis, and creative coding tasks.

by

$1.00/1M input tokens$6.00/1M output tokens

-50%

GPT 5.1 Chat

8.45M Tokens

GPT-5.1 Chat (also known as Instant) is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.

by

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT 5.1

142M Tokens

GPT-5.1 is the newest top-tier model in the GPT-5 series, featuring enhanced general reasoning, better instruction following, and a more natural conversational tone compared to GPT-5. With adaptive reasoning, it dynamically adjusts its computational effort—responding swiftly to simple queries and diving deeper into complex tasks. Explanations are now clearer and use less jargon, making challenging topics easier to grasp. Designed for a wide range of tasks, GPT-5.1 consistently improves performance in math, coding, and structured analysis, offering more cohesive long-form responses and more reliable tool usage. Its conversation style is warmer and more intuitive, yet still precise. GPT-5.1 stands as the main, fully capable successor to GPT-5.

by

$0.63/1M input tokens$5.00/1M output tokens

-50%

Claude Haiku 4.5

513M Tokens

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, offering near-frontier intelligence with much lower cost and latency than larger Claude models. It matches Claude Sonnet 4’s performance in reasoning, coding, and computer-use tasks, making it ideal for real-time and large-scale applications. Haiku 4.5 introduces controllable reasoning depth, supports summarized or interleaved thought outputs, and enables tool-assisted workflows across coding, bash, web search, and computer-use tools. With over 73% on SWE-bench Verified, it stands among the top coding models while maintaining fast responsiveness for sub-agents, parallel execution, and scaled deployment.

by

$0.50/1M input tokens$2.50/1M output tokens

-50%

Claude Sonnet 4.5

1.23B Tokens

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model, optimized for real-world agents and coding workflows. It achieves state-of-the-art results on coding benchmarks like SWE-bench Verified, with notable improvements in system design, code security, and following specifications. Designed for extended autonomous operation, the model maintains task continuity across sessions and offers fact-based progress tracking. Sonnet 4.5 features enhanced agentic abilities, such as improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With better context tracking and awareness of token usage across tool calls, it excels in multi-context and long-running workflows. Key use cases include software engineering, cybersecurity, financial analysis, research agents, and other areas requiring sustained reasoning and tool use.

by

$1.50/1M input tokens$7.50/1M output tokens

-50%

Gemini 2.5 Flash Lite Preview 09-2025

130M Tokens

Gemini 2.5 Flash-Lite Preview September 2025 Checkpoint is a lightweight, high-throughput model from the Gemini 2.5 family, focused on ultra-low latency and cost efficiency. It delivers even faster token generation, concise output, and improved performance on standard benchmarks compared to earlier Flash-Lite models, making it ideal for large-scale, real-time applications.

by

$0.05/1M input tokens$0.20/1M output tokens

-50%

Gemini 2.5 Flash Preview 09-2025

183M Tokens

Gemini 2.5 Flash Preview September 2025 Checkpoint is Google’s high-performance model, built for advanced reasoning, code generation, mathematical tasks, and scientific applications. This version introduces faster, more efficient output and smarter tool use for complex, multi-step workflows.

by

$0.15/1M input tokens$1.25/1M output tokens

-50%

GPT-5 Codex

39.9M Tokens

GPT-5-Codex is a specialized version of GPT-5 tailored for software engineering and coding tasks. It is suitable for both interactive development sessions and the independent execution of complex engineering projects. The model is capable of building projects from scratch, developing new features, debugging, performing large-scale refactoring, and conducting code reviews. Compared to the standard GPT-5, Codex offers greater steerability, follows developer instructions more closely, and delivers cleaner, higher-quality code.

by

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT-5 Mini (Free)

1.41B Tokens

A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.

by

Free

GPT-5 Chat Latest

36.6M Tokens

GPT-5 Chat is tailored for advanced, natural, and context-aware conversations in enterprise environments. It leverages the latest advancements in OpenAI’s conversational AI, supporting multimodal and dynamic dialogue with enhanced context retention and user intent understanding.

by

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT-5 Nano

223M Tokens

The smallest and fastest member of the GPT-5 family, optimized for developer tools, rapid user interactions, and ultra-low latency environments. While it offers limited reasoning depth compared to larger models, GPT-5-Nano preserves essential instruction-following and safety mechanisms. It is the successor to GPT-4.1-nano and is best suited for real-time, cost-sensitive, or embedded applications.

by

$0.02/1M input tokens$0.20/1M output tokens

-50%

GPT-5 Mini

1.15B Tokens

A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.

by

$0.13/1M input tokens$1.00/1M output tokens

-50%

GPT-5

379M Tokens

OpenAI’s most advanced large language model, engineered for high-stakes applications requiring step-by-step reasoning, precise instruction following, and robust code generation. GPT-5 introduces major improvements in factual accuracy, user intent understanding, and hallucination reduction. It supports advanced prompt routing, user-specified intent (such as "think hard about this"), and is optimized for complex workflows in coding, writing, and health-related domains.

by

$0.63/1M input tokens$5.00/1M output tokens

-50%

GPT OSS 20B

817M Tokens

OpenAI’s 21B-parameter open-weight Mixture-of-Experts (MoE) model, released under the Apache 2.0 license. Features 3.6B active parameters per forward pass, optimized for low-latency inference and deployability on consumer or single-GPU hardware. Trained in OpenAI’s Harmony response format, it supports reasoning level configuration, fine-tuning, and agentic capabilities such as function calling and structured outputs.

by

$0.02/1M input tokens$0.10/1M output tokens

-50%

GPT OSS 120B

849M Tokens

An open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI, designed for high-reasoning, agentic, and general-purpose production use cases. Activates 5.1B parameters per forward pass and is optimized for single H100 GPU deployment with native MXFP4 quantization. Supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

by

$0.07/1M input tokens$0.30/1M output tokens

-50%

Claude Opus 4.1

33.9M Tokens

Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves high scores on SWE-bench Verified and excels in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for research, data analysis, and tool-assisted workflows.

by

$10.00/1M input tokens$50.00/1M output tokens

-50%

Gemini 2.5 Flash Lite

1.7M Tokens

Gemini 2.5 Flash-Lite is a streamlined reasoning model from the Gemini 2.5 family, designed for extremely low latency and cost-effectiveness. It delivers higher throughput, quicker token generation, and enhanced performance on standard benchmarks compared to previous Flash models.

by

$0.05/1M input tokens$0.20/1M output tokens

-50%

Gemini 2.5 Flash (Free)

16.6B Tokens

Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.

by

Free

GPT-4 1106 Preview

1.35M Tokens

The April 2023 release of GPT-4 Turbo, supporting vision, JSON mode, and function calling. Trained on data up to April 2023, optimized for advanced multimodal tasks.

by

$5.00/1M input tokens$15.00/1M output tokens

-50%

Gemini 2.0 Flash Lite

21.2M Tokens

Gemini 2.0 Flash Lite is optimized for extremely fast response times and low cost, while maintaining the quality of larger models. Ideal for real-time and large-scale applications.

by

$0.04/1M input tokens$0.15/1M output tokens

-50%

Gemini 2.5 Flash

903M Tokens

Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.

by

$0.15/1M input tokens$1.25/1M output tokens

-50%

Gemini 2.5 Pro

971M Tokens

Gemini 2.5 Pro is Google’s state-of-the-art AI model, designed for advanced reasoning, coding, mathematics, and scientific tasks. Employs “thinking” capabilities for nuanced context handling and achieves top-tier performance on multiple benchmarks, including first-place on the LMArena leaderboard.

by

$0.63/1M input tokens$5.00/1M output tokens

-50%

O1

72.5K Tokens

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. Trained with large-scale reinforcement learning for chain-of-thought reasoning, it is optimized for math, science, programming, and other STEM tasks, consistently achieving PhD-level accuracy on industry benchmarks.

by

$7.50/1M input tokens$30.00/1M output tokens

-50%

GPT-4

924K Tokens

GPT-4.1, a flagship model for advanced instruction following, software engineering, and long-context reasoning. Supports a 1 million token context window and is tuned for precise code diffs, agent reliability, and high recall in large document contexts.

by

$15.00/1M input tokens$30.00/1M output tokens

-50%

GPT-4o 2024-05-13

133K Tokens

GPT-4o (“o” for “omni”) is OpenAI’s latest multimodal model, supporting both text and image inputs with text outputs. Delivers improved performance in non-English languages and visual understanding, while being faster and more cost-effective than previous models.

by

$2.50/1M input tokens$7.50/1M output tokens

-50%

GPT-4 Turbo

99.6K Tokens

The latest GPT-4 Turbo model with vision capabilities, supporting JSON mode and function calling. Trained on data up to December 2023, it is optimized for high-throughput, multimodal applications.

by

$5.00/1M input tokens$15.00/1M output tokens

-50%

GPT-4 Turbo Preview

171K Tokens

Preview release of GPT-4, featuring improved instruction following, JSON mode, reproducible outputs, and parallel function calling. Trained on data up to December 2023. Heavily rate-limited while in preview.

by

$5.00/1M input tokens$15.00/1M output tokens

-50%

O1 Mini

54.8K Tokens

Experimental mini version of OpenAI’s o1 model, optimized for STEM tasks with efficient performance. Not recommended for production use and may be heavily rate-limited.

by

$1.50/1M input tokens$6.00/1M output tokens

-50%

GPT-4o 2024-08-06

127K Tokens

The August 2024 version of GPT-4o, offering improved structured output capabilities, including support for JSON schema in responses. Maintains high intelligence and efficiency, with enhanced non-English and visual performance.

by

$1.25/1M input tokens$5.00/1M output tokens

-50%

Claude 3.5 Sonnet

8.85M Tokens

Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.

by

$1.50/1M input tokens$7.50/1M output tokens

-50%

Codex Mini

882K Tokens

A fine-tuned version of o4-mini, specifically optimized for use in Codex CLI. Recommended for code-related tasks, with improved performance in code generation and completion.

by

$0.75/1M input tokens$3.00/1M output tokens

-50%

O3

25.3M Tokens

A well-rounded, powerful model from OpenAI, setting new standards in math, science, coding, and visual reasoning. Excels at technical writing and instruction-following, and is designed for multi-step problem solving across text, code, and images. BYOK is required for access.

by

$1.00/1M input tokens$4.00/1M output tokens

-50%

GPT-4o Search Preview

168K Tokens

Specialized GPT-4o variant trained for web search understanding and execution within chat completions, enabling advanced search query comprehension.

by

$1.25/1M input tokens$5.00/1M output tokens

-50%

Claude 3.5 Haiku

5.99M Tokens

Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.

by

$0.40/1M input tokens$2.00/1M output tokens

-50%

O4 Mini

45.5M Tokens

A compact reasoning model in OpenAI’s o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. Supports tool use and demonstrates competitive reasoning and coding performance across benchmarks, outperforming its predecessor o3-mini and approaching o3 in some domains. Well-suited for high-throughput scenarios where latency or cost is critical.

by

$0.55/1M input tokens$2.20/1M output tokens

-50%

Claude Opus 4

35.3M Tokens

Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves high scores on SWE-bench Verified and excels in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for research, data analysis, and tool-assisted workflows.

by

$10.00/1M input tokens$50.00/1M output tokens

-50%

ChatGPT-4o Latest

13M Tokens

The continually updated version of OpenAI ChatGPT 4o, always pointing to the current GPT-4o model used by ChatGPT. Incorporates additional RLHF and may differ from the API version. Intended for research and evaluation, not recommended for production as it may be redirected or removed in the future.

by

$2.50/1M input tokens$7.50/1M output tokens

-50%

GPT-4.1 Nano

14M Tokens

The fastest and most cost-effective model in the GPT-4.1 series, designed for tasks demanding low latency such as classification and autocompletion. Maintains a 1 million token context window and delivers exceptional performance at a small size, outperforming even some larger models on key benchmarks.

by

$0.05/1M input tokens$0.20/1M output tokens

-50%

GPT-4.1 Mini

190M Tokens

A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.

by

$0.20/1M input tokens$0.80/1M output tokens

-50%

GPT-4.1 Mini (Free)

253M Tokens

A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.

by

Free

Claude Sonnet 4

224M Tokens

Claude Sonnet 4 is a next-generation model from Anthropic, significantly enhancing coding and reasoning capabilities over its predecessor. It achieves state-of-the-art performance on SWE-bench, balances capability and computational efficiency, and is optimized for both routine and complex software development projects. Key features include improved codebase navigation, reduced error rates, and increased reliability in following intricate instructions.

by

$1.50/1M input tokens$7.50/1M output tokens

-50%

O3 Mini

7.11M Tokens

A cost-efficient language model from OpenAI, optimized for STEM reasoning tasks, especially in science, mathematics, and coding. Supports the `reasoning_effort` parameter for adjustable thinking time and features significant improvements over its predecessor, with better performance on complex questions and lower latency and cost.

by

$0.55/1M input tokens$2.20/1M output tokens

-50%

Claude 3.7 Sonnet

10.9M Tokens

Claude 3.7 Sonnet is an advanced large language model from Anthropic, featuring improved reasoning, coding, and problem-solving abilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model excels in agentic workflows, front-end development, and full-stack updates, and offers an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following.

by

$1.50/1M input tokens$7.50/1M output tokens

-50%

GPT-4o Mini

461M Tokens

OpenAI’s most advanced small model, GPT-4o mini, supports both text and image inputs with text outputs. It is highly cost-effective, achieving SOTA intelligence and outperforming larger models on key benchmarks, making it ideal for scalable, interactive applications.

by

$0.07/1M input tokens$0.30/1M output tokens

-50%

GPT-4o

131M Tokens

The November 2024 release of GPT-4o, featuring enhanced creative writing, more natural and engaging responses, and improved file handling. Maintains the intelligence of GPT-4 Turbo while being twice as fast and 50% more cost-effective, with better support for non-English languages and visual tasks.

by

$1.25/1M input tokens$5.00/1M output tokens

-50%

Gemini 2.0 Flash

367M Tokens

Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.

by

$0.05/1M input tokens$0.20/1M output tokens

-50%

GPT-4.1

141M Tokens

A flagship large language model from OpenAI, optimized for advanced instruction following, real-world software engineering, and long-context reasoning. Supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 in coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding. Tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

by

$1.00/1M input tokens$4.00/1M output tokens

-50%