Models
Explore a broad selection of AI models available on the NagaAI platform.
Explore a broad selection of AI models available on the NagaAI platform.
Kimi K2.5 is Moonshot AI’s proprietary multimodal model, offering cutting-edge visual coding abilities and supporting a self-directed agent swarm approach. Developed from Kimi K2 and further trained on around 15 trillion mixed visual and text tokens, it achieves excellent results in general reasoning, visual coding, and autonomous tool invocation.
Ideal for high-quality image manipulation, style transfer, and sequential editing workflows
FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.
FLUX.2 [flex] excels at rendering complex text, typography, and fine details, and supports multi-reference editing within the same unified architecture.
FLUX.2 [klein] 4B is the quickest and most budget-friendly model in the FLUX.2 family, designed for high-throughput workloads while still delivering excellent image quality.
As a SOTA 30B-class model, GLM-4.7-Flash provides a new option that balances efficiency and performance. It has been further optimized for agentic coding scenarios, enhancing coding abilities, long-term task planning, and tool integration, and has demonstrated leading results among open-source models of its size on multiple current public benchmark leaderboards.
GPT-5.2-Codex is an enhanced version of GPT-5.1-Codex, optimized for software engineering and coding tasks. Designed for both interactive sessions and longer, independent execution, it excels at building projects, developing features, debugging, large-scale refactoring, and code review. Compared to its predecessor, 5.2-Codex follows developer instructions more closely and delivers cleaner, higher-quality code. It integrates seamlessly with developer tools—CLI, IDEs, GitHub, and cloud platforms—and adapts its reasoning effort based on task complexity, providing quick responses for simple tasks and maintaining extended performance for larger projects. The model supports structured code reviews, detects critical flaws, validates behavior against tests, and handles multimodal inputs like images for UI work. It is specifically tailored for agentic coding applications.
Qwen-Image-2512 is the latest open-source text-to-image foundational model from Qwen, delivering substantial upgrades over its predecessor, the August Qwen-Image release. This new version significantly enhances human realism—reducing the “AI-generated” look with richer facial detail, more accurate age cues, and better adherence to pose and context instructions. It also renders finer natural detail across landscapes and wildlife, improving textures such as water flow, foliage, mist, and animal fur with more precise strand- and material-level fidelity. In addition, Qwen-Image-2512 improves text rendering and multimodal composition, producing clearer, more accurate typography, stronger layout control, and more reliable generation of complex slide-like designs and infographics. Altogether, these improvements make Qwen-Image-2512 a more photorealistic, detail-faithful, and text-capable image generator suitable for both creative and practical visual production.
Qwen-Image-Edit-2511 is the latest proprietary image editing model from Qwen, delivering substantial upgrades over its predecessor, Qwen-Image-Edit-2509. The new version features notable improvements in editing consistency, especially in multi-subject scenarios and character preservation, allowing for more faithful subject representation across edited images. Integrated support for popular community LoRAs now enables advanced lighting control and novel viewpoint generation natively. In addition, Qwen-Image-Edit-2511 offers enhanced industrial design capabilities, robust geometric reasoning for technical annotations, and improved fusion of multiple images. These advances result in more reliable, visually coherent, and creative image editing—making Qwen-Image-Edit-2511 a powerful and versatile tool for both imaginative and practical visual applications.
Seedream 4.5 is the newest proprietary image generation model from ByteDance. Compared to Seedream 4.0, it offers substantial overall improvements—particularly in editing consistency, where it better maintains subject details, lighting, and color tones. The model also delivers enhanced portrait clarity and improved small-text rendering. Its ability to compose multiple images has been significantly upgraded, and advances in both inference performance and visual aesthetics allow for more accurate and artistically expressive image creation.
MiniMax-M2.1 is a cutting-edge, lightweight large language model designed for coding, agentic workflows, and modern application development. With just 10 billion activated parameters, it offers a significant boost in real-world performance while ensuring low latency, high scalability, and cost-effectiveness. Compared to the previous version, M2.1 delivers more concise, clearer outputs and quicker response times. It excels in multilingual coding, achieving 49.4% on Multi-SWE-Bench and 72.5% on SWE-Bench Multilingual, making it an adaptable engine for IDEs, coding tools, and a wide range of assistant applications.
GLM-4.7 is Z.AI’s newest flagship model, offering advancements in two main aspects: improved programming abilities and greater stability in multi-step reasoning and execution. It shows notable progress in handling complex agent tasks, while also providing more natural conversational experiences and enhanced front-end design.
MiMo-V2-Flash is an open-source foundational language model created by Xiaomi, featuring a Mixture-of-Experts architecture with 309 billion total parameters (15 billion active) and a hybrid attention mechanism. It supports hybrid-thinking, offers a 256K context window, and excels in reasoning, coding, and agent-based tasks. Ranking #1 among open-source models worldwide on SWE-bench Verified and Multilingual benchmarks, MiMo-V2-Flash matches the performance of Claude Sonnet 4.5 at just 3.5% of the cost. For best and fastest results when using agentic tools like Claude Code, Cline, or Roo Code, be sure to disable reasoning mode, as the model is extensively optimized for these scenarios.
Gemini 3 Flash Preview is a high-speed, cost-effective reasoning model built for agent-driven workflows, multi-turn conversation, and coding support. Offering near-Pro level performance in both reasoning and tool use, it stands out by delivering significantly lower latency than larger Gemini versions—making it ideal for interactive development, long-running agent loops, and collaborative programming. Compared to Gemini 2.5 Flash, it features notable improvements in reasoning ability, multimodal comprehension, and overall reliability. The model supports a 1M token context window and handles multimodal inputs—text, images, audio, video, and PDFs—with text-based output. Features like configurable reasoning levels, structured outputs, tool integration, and automatic context caching make it a strong choice for users seeking powerful agentic capabilities without the high cost or lag of more extensive models.
GPT-Image-1.5 is the flagship image generation and editing model from OpenAI, designed for precise, natural, and fast creation. It reliably follows user instructions down to fine details, preserving critical elements like lighting, composition, and facial likeness across edits and generations. GPT-Image-1.5 excels at a wide range of editing tasks—including addition, removal, stylization, combination, and advanced text rendering—producing images that closely match user intent. With up to 4× faster generation speeds compared to previous versions, it streamlines creative workflows, enabling quick iterations whether you need a simple fix or a total visual transformation. Enhanced integration and lower API costs make GPT-Image-1.5 ideal for marketing, product visualization, ecommerce, and creative tools scenarios, while its dedicated editor and presets provide a delightful, accessible creative space for both practical and expressive image work.
GPT-5.2 Chat (also known as Instant) is the fast and lightweight version of the 5.2 family, built for low-latency chatting while maintaining strong general intelligence. It leverages adaptive reasoning to focus more “thinking” on challenging queries, boosting accuracy in math, coding, and multi-step tasks without sacrificing speed in everyday conversations. The model is naturally warmer and more conversational, with improved instruction following and more stable short-form reasoning. GPT-5.2 Chat is ideal for high-throughput, interactive scenarios where quick response and consistency are more important than in-depth analysis.
GPT-5.2 Pro is OpenAI's most advanced model, featuring significant upgrades in agentic coding and long-context capabilities compared to GPT-5 Pro. It is specifically optimized for handling complex tasks that demand step-by-step reasoning, precise instruction following, and accuracy in critical scenarios. The model supports advanced test-time routing and sophisticated prompt understanding, including user cues like "think hard about this." Key improvements include reduced hallucination and sycophancy, along with stronger performance in coding, writing, and health-related tasks.
GPT-5.2 is the newest frontier-level model in the GPT-5 line, providing enhanced agentic abilities and better long-context performance than GPT-5.1. It employs adaptive reasoning to dynamically distribute computational resources, enabling quick responses to simple requests and deeper analysis for complex challenges. Designed for wide-ranging tasks, GPT-5.2 offers steady improvements in mathematics, programming, science, and tool usage, delivering more coherent long-form responses and increased reliability when using tools.
GPT-5.1-Codex-Max is OpenAI’s newest agentic coding model, created for extended, high-context software development tasks. Built on an enhanced 5.1 reasoning stack, it’s been trained with agentic workflows covering software engineering, mathematics, and research. GPT-5.1-Codex-Max offers faster performance, better reasoning abilities, and increased token efficiency throughout the development process.
Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total). Released under the Apache 2.0 license.
DeepSeek-V3.2 is a large language model optimized for high computational efficiency and strong tool-use reasoning. It features DeepSeek Sparse Attention (DSA), a mechanism that lowers training and inference costs while maintaining quality in long-context tasks. A scalable reinforcement learning post-training framework further enhances reasoning, achieving performance comparable to GPT-5 and earning top results on the 2025 IMO and IOI. V3.2 also leverages large-scale agentic task synthesis to improve reasoning in practical tool-use scenarios, boosting its generalization and compliance in interactive environments.
DeepSeek-TNG-R1T2-Chimera is TNG Tech's second-generation Chimera text-generation model. Built from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints using Assembly-of-Experts merging, this 671B-parameter model combines strengths from all three. Its tri-parent design delivers strong reasoning ability while being about 20% faster than the original R1 and over twice as fast as R1-0528 on vLLM, providing a great balance of cost and performance. The model supports up to 60k-token input (tested up to ~130k) and stable <think> token behavior, making it ideal for long-context analysis, dialogue, and general text generation.
Claude Opus 4.5 is Anthropic’s latest reasoning model, developed for advanced software engineering, complex agent workflows, and extended computer tasks. It features strong multimodal capabilities and performs competitively on real-world coding and reasoning benchmarks, with enhanced resilience against prompt injection. Optimized for efficiency at varying effort levels, it allows developers to balance speed, depth, and token usage according to their specific needs, thanks to a new parameter for controlling token efficiency. Opus 4.5 excels in advanced tool integration, contextual management, and multi-agent coordination, making it ideal for autonomous research, debugging, complex planning, and spreadsheet or browser manipulation. Compared to previous Opus generations, it delivers significant improvements in structured reasoning, long-duration task performance, execution reliability, and alignment, all while reducing token overhead.
Gemini 3 Pro Image Preview (Nano Banana Pro) is Google’s most advanced image generation and editing model, built on Gemini 3 Pro. Building on the original Nano Banana, it offers much improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model produces context-rich visuals—from infographics and diagrams to cinematic composites—and can incorporate up-to-the-minute information through Search grounding. It leads the industry with sophisticated text rendering in images, handles consistent multi-image blending, and maintains accurate identity preservation for up to five subjects. Nano Banana Pro gives users fine-grained creative controls like localized edits, lighting and focus adjustments, camera transformations, 2K/4K output, and flexible aspect ratios. Tailored for professional design, product visualization, storyboarding, and complex compositions, it remains efficient for everyday image creation needs.
Grok 4.1 Fast Reasoning is xAI's most capable tool-calling model, engineered for production-grade agentic applications with a 2M token context window. Achieving state-of-the-art results on Berkeley Function Calling v4 and leading agentic search benchmarks like Research-Eval Reka (63.9) and FRAMES (87.6), it excels at multi-turn conversations, long-horizon planning, and autonomous task execution. Built through RL training in real-world simulated environments, Grok 4.1 Fast Reasoning delivers exceptional performance on complex enterprise scenarios like customer support and finance while cutting hallucination rates in half compared to its predecessor.
Grok 4.1 Fast Non-Reasoning is xAI's high-speed variant optimized for instant responses and straightforward queries, featuring a 2M token context window. Designed for production workflows requiring rapid inference without deep reasoning overhead, it maintains strong tool-calling capabilities and multi-turn consistency while delivering faster response times. Ideal for real-time applications, customer-facing chatbots, and scenarios where speed is critical, Grok 4.1 Fast Non-Reasoning balances performance with cost-effectiveness for efficient, production-ready agent deployments.
Gemini 3 Pro Preview is Google’s most advanced AI model, setting new records on leading benchmarks like LMArena (1501 Elo), GPQA Diamond (91.9%), and MathArena Apex (23.4%). With powerful reasoning and deep multimodal understanding across text, images, code, video, and audio, Gemini 3 Pro Preview delivers nuanced, context-aware responses and excels at complex problem-solving, scientific analysis, and creative coding tasks.
GPT-5.1-Codex-Mini is a more compact and faster variant of GPT-5.1-Codex.
GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It's designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, closely follows developer instructions, and produces cleaner, higher-quality code. Codex integrates into developer environments like the CLI, IDE extensions, GitHub, and cloud tasks. It adapts its reasoning dynamically—providing quick answers for small tasks and sustaining long, multi-hour runs for large projects. The model is trained for structured code reviews, identifying critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs like images or screenshots for UI development and integrates tools for search, dependency installation, and environment setup. Codex is specifically intended for agentic coding applications.
GPT-5.1 Chat (also known as Instant) is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.1 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.
GPT-5.1 is the newest top-tier model in the GPT-5 series, featuring enhanced general reasoning, better instruction following, and a more natural conversational tone compared to GPT-5. With adaptive reasoning, it dynamically adjusts its computational effort—responding swiftly to simple queries and diving deeper into complex tasks. Explanations are now clearer and use less jargon, making challenging topics easier to grasp. Designed for a wide range of tasks, GPT-5.1 consistently improves performance in math, coding, and structured analysis, offering more cohesive long-form responses and more reliable tool usage. Its conversation style is warmer and more intuitive, yet still precise. GPT-5.1 stands as the main, fully capable successor to GPT-5.
Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model, extending the K2 series into agentic, long-horizon reasoning. Built on a trillion-parameter Mixture-of-Experts (MoE) architecture, it activates 32 billion parameters per forward pass and supports a 256k-token context window. Optimized for persistent step-by-step thought and dynamic tool use, it enables complex reasoning workflows and stable multi-agent behavior across 200–300 tool calls, setting new open-source records on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench. With MuonClip optimization and large-scale MoE architecture, it delivers strong reasoning depth and high inference efficiency for demanding agentic and analytical tasks.
MiniMax-M2 is a compact, efficient language model with 10B active (230B total) parameters, optimized for coding and agentic workflows. It achieves near-frontier reasoning and tool use with low latency and deployment cost. The model excels in code generation, multi-file editing, compile-run-fix cycles, and automated test repair, showing strong results on SWE-Bench and Terminal-Bench. MiniMax-M2 performs well in agentic benchmarks like BrowseComp and GAIA, handling long-term planning, retrieval, and error recovery. With a small activation footprint, it delivers fast inference and high concurrency, making it ideal for developer tools, agents, and applications that demand cost-effective, responsive reasoning.
Hunyuan Image 3.0 is Tencent’s next-generation native multimodal model, engineered for unified multimodal understanding and generation within an autoregressive framework. Featuring the largest open-source image generation Mixture of Experts (MoE) architecture—80 billion parameters and 64 experts—it delivers state-of-the-art photorealistic imagery and exceptional prompt fidelity. HunyuanImage-3.0 excels at intelligent world knowledge reasoning, automatically enriching sparse prompts with contextually relevant details, and achieves benchmark-leading performance in both text-to-image and integrated multimodal tasks.
Seedream 4.0 is ByteDance’s advanced text-to-image and image editing model, designed for high-speed, high-resolution image generation and robust contextual understanding. It unifies generation and editing in a single architecture, supports complex visual tasks with natural-language instructions, and excels at multi-reference batches and diverse style transfers. Seedream 4.0 stands out for its ability to handle both content creation and modification, offering creative professionals and enterprises an all-in-one, efficient solution for imaginative and knowledge-driven visual tasks.
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, offering near-frontier intelligence with much lower cost and latency than larger Claude models. It matches Claude Sonnet 4’s performance in reasoning, coding, and computer-use tasks, making it ideal for real-time and large-scale applications. Haiku 4.5 introduces controllable reasoning depth, supports summarized or interleaved thought outputs, and enables tool-assisted workflows across coding, bash, web search, and computer-use tools. With over 73% on SWE-bench Verified, it stands among the top coding models while maintaining fast responsiveness for sub-agents, parallel execution, and scaled deployment.
Gemini 2.5 Flash Image, also known as "Nano Banana" is a state-of-the-art image generation model with strong contextual understanding. It supports image generation, editing, and multi-turn conversational interactions.
Mistral Moderation 2411 is a content moderation model from Mistral, offering high-accuracy text moderation across nine safety categories and multiple languages. It is designed for robust, real-time moderation in diverse environments.
Venice Uncensored is a fine-tuned version of Mistral-Small-24B-Instruct-2501, created by dphn.ai in partnership with Venice.ai. This "uncensored" instruct-tuned LLM is built to give users full control over alignment, system prompts, and model behavior. Designed for advanced and unrestricted scenarios, it prioritizes steerability and transparency, removing the default safety and alignment layers present in most mainstream assistant models.
Amazon Nova Pro 1.0 is a versatile multimodal model from Amazon, designed to balance accuracy, speed, and cost across a wide range of tasks. As of December 2024, it delivers state-of-the-art results on key benchmarks like visual question answering (TextVQA) and financial document analysis. The model excels in processing both visual and textual information, though video input is currently not supported.
Amazon Nova Lite 1.0 is a low-cost multimodal model from Amazon, designed for fast processing of image, video, and text inputs to generate text output. It handles real-time customer interactions, document analysis, and visual question answering with high accuracy. With a 300K token input context, it can process multiple images or up to 30 minutes of video in a single input.
Amazon Nova Micro 1.0 is a text-only model in the Amazon Nova family, optimized for ultra-low latency and cost. With a 128K token context length, it excels at text summarization, translation, content classification, interactive chat, and brainstorming, while also offering basic mathematical reasoning and coding abilities.
Ministral 8B is an 8B parameter model with a unique interleaved sliding-window attention pattern for faster and more memory-efficient inference. Optimized for edge use cases, it supports up to 128k context length and delivers strong performance in knowledge and reasoning tasks. Exceeding other models in the sub-10B category, it is ideal for low-latency and privacy-focused applications.
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral, optimized for instruction following, repetition reduction, and improved function calling. It supports both image and text inputs, delivers strong performance across coding, STEM, and vision benchmarks, and is designed for efficient, structured output generation.
Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters.
GLM-4.6 is the latest version in the GLM series, featuring a longer 200K token context window (up from 128K in GLM-4.5) for handling more complex tasks. It offers improved coding performance with higher benchmark scores and better real-world results, including visually enhanced front-end code generation. The model also delivers stronger reasoning, more effective tool use during inference, better integration within agent frameworks, and more refined, human-like writing style compared to GLM-4.5.
Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model, optimized for real-world agents and coding workflows. It achieves state-of-the-art results on coding benchmarks like SWE-bench Verified, with notable improvements in system design, code security, and following specifications. Designed for extended autonomous operation, the model maintains task continuity across sessions and offers fact-based progress tracking. Sonnet 4.5 features enhanced agentic abilities, such as improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With better context tracking and awareness of token usage across tool calls, it excels in multi-context and long-running workflows. Key use cases include software engineering, cybersecurity, financial analysis, research agents, and other areas requiring sustained reasoning and tool use.
DeepSeek-V3.2-Exp is an experimental large language model from DeepSeek, serving as an intermediate step between V3.1 and future architectures. It features DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that enhances training and inference efficiency for long-context tasks while preserving high output quality.
Qwen3 Coder Plus is Alibaba's proprietary version of the open-source Qwen3 Coder 480B A35B, designed as a powerful coding agent that excels in autonomous programming through tool use and environment interaction, blending strong coding skills with broad general-purpose capabilities.
Gemini 2.5 Flash-Lite Preview September 2025 Checkpoint is a lightweight, high-throughput model from the Gemini 2.5 family, focused on ultra-low latency and cost efficiency. It delivers even faster token generation, concise output, and improved performance on standard benchmarks compared to earlier Flash-Lite models, making it ideal for large-scale, real-time applications.
Gemini 2.5 Flash Preview September 2025 Checkpoint is Google’s high-performance model, built for advanced reasoning, code generation, mathematical tasks, and scientific applications. This version introduces faster, more efficient output and smarter tool use for complex, multi-step workflows.
Qwen3-VL-235B-A22B Thinking is a multimodal model that combines advanced text generation with visual understanding for images and video, specifically optimized for multimodal reasoning in STEM and math. It delivers robust perception, strong spatial (2D/3D) understanding, and long-form visual comprehension, showing competitive performance in public benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction, tool use, following complex instructions in multi-image dialogues, aligning text with video timelines, and automating GUI operations. The model also enables visual coding workflows, such as turning sketches into code and assisting with UI debugging, while maintaining strong text-only capabilities on par with Qwen3 language models. This makes it ideal for use cases like document AI, multilingual OCR, UI/software help, spatial reasoning, and vision-language agent research.
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that combines strong text generation with advanced visual understanding for images and video. Designed for general vision-language tasks like VQA, document parsing, chart and table extraction, and multilingual OCR, the model emphasizes robust perception, spatial (2D/3D) understanding, and long-form visual comprehension, with competitive results on public benchmarks. Qwen3-VL also supports agentic interaction and tool use, following complex instructions in multi-image dialogues, aligning text to video timelines, operating GUIs for automation, and enabling visual coding workflows such as turning sketches into code or debugging UIs. Its strong text-only capabilities match Qwen3 language models, making it suitable for document AI, OCR, UI/software assistance, spatial reasoning, and vision-language agent research.
Qwen-Image-Edit-2509 is the latest iteration of the Qwen-Image-Edit model, released in September. It introduces multi-image editing capabilities by building on the original architecture and further training with image concatenation, supporting combinations like “person + person,” “person + product,” and “person + scene,” with optimal performance for 1 to 3 images. For single-image editing, Qwen-Image-Edit-2509 delivers improved consistency, particularly in person editing (better facial identity preservation and support for various portrait styles), product editing (enhanced product identity retention), and text editing (support for modifying fonts, colors, and materials in addition to content). The model also natively supports ControlNet features, such as depth maps, edge maps, and keypoint maps.
GPT-5-Codex is a specialized version of GPT-5 tailored for software engineering and coding tasks. It is suitable for both interactive development sessions and the independent execution of complex engineering projects. The model is capable of building projects from scratch, developing new features, debugging, performing large-scale refactoring, and conducting code reviews. Compared to the standard GPT-5, Codex offers greater steerability, follows developer instructions more closely, and delivers cleaner, higher-quality code.
DeepSeek-V3.1 Terminus is an enhanced version of DeepSeek V3.1 that retains the original model’s capabilities while resolving user-reported issues, such as language consistency and agent functionality. The update further refines the model’s performance in coding and search agent tasks. This large-scale hybrid reasoning model (671B parameters, 37B active) supports both thinking and non-thinking modes. Building on the DeepSeek-V3 foundation, it incorporates a two-phase long-context training approach, allowing for up to 128K tokens, and adopts FP8 microscaling for more efficient inference.
GLM-4.5 is the latest flagship foundation model from Z.AI, specifically designed for agent-based applications. It utilizes a Mixture-of-Experts (MoE) architecture and supports context lengths of up to 128k tokens. GLM-4.5 offers significantly improved capabilities in reasoning, code generation, and agent alignment. It features a hybrid inference mode with two options: a "thinking mode," tailored for complex reasoning and tool usage, and a "non-thinking mode," optimized for instant responses.
A blazing fast variant for instant, cost-effective answers without reasoning traces. Built on the same Grok 4 Fast backbone for unified quality and efficiency, it excels at search, summarization, Q&A, and lightweight agent use. Delivers low latency, reduced token cost, and supports the 2M token context for long inputs. Perfect for rapid and scalable information workflows.
State-of-the-art reasoning model optimized for cost-efficient, high-quality chain-of-thought. Trained end-to-end with tool use and agentic search, it matches top-tier benchmarks like AIME, HMMT, and GPQA at 40% lower token use versus Grok 4. Features a huge 2M token context and native web/X browsing. Ideal for agentic workflows, research, code, logic, and complex multi-step tasks. Offers up to 98% cheaper reasoning versus previous models.
Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.
Qwen3-Next-80B-A3B-Thinking is a reasoning-focused model that generates structured “thinking” traces by default. Suited for complex multi-step tasks like math proofs, code synthesis, logic, and agentic planning. Compared to earlier Qwen3 models, it’s more stable with long reasoning chains and scales efficiently during inference. Designed for agent frameworks, function calling, retrieval-based workflows, and benchmarks needing step-by-step solutions, it supports detailed completions and faster output through multi-token prediction. Runs only in thinking mode.
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model from the Qwen3-Next series, designed for quick and stable responses without “thinking” traces. It handles complex tasks like reasoning, code generation, knowledge Q&A, and multilingual applications with strong alignment and formatting. Compared to earlier Qwen3 instruct versions, it offers higher throughput and stability, even with long inputs or multi-turn conversations. Ideal for RAG, tool use, and agentic workflows, it delivers consistent and reliable answers with efficient parameter use and fast inference.
Qwen3-Max, the updated model in the Qwen3 series, brings significant advances in reasoning, instruction following, multilingual support, and knowledge coverage compared to the January 2025 version. It offers better accuracy in math, coding, logic, and science, handles complex instructions in Chinese and English more reliably, reduces hallucinations, and gives higher-quality responses in open Q&A and conversations. Supporting 100+ languages, it improves translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool use, though it lacks a specific “thinking” mode.
Kimi K2 0905 is the September update of Kimi K2 0711, a Mixture-of-Experts (MoE) language model from Moonshot AI with 1 trillion parameters and 32 billion active per pass. The long-context window has been expanded to 256k tokens. This release brings improved agentic coding accuracy and generalization across scaffolds, as well as more aesthetic and functional frontend code for web, 3D, and similar tasks. Kimi K2 remains optimized for advanced tool use, reasoning, and code synthesis, excelling in benchmarks like LiveCodeBench, SWE-bench, ZebraLogic, GPQA, Tau2, and AceBench. Its training uses a novel stack with the MuonClip optimizer for stable large-scale MoE training.
Gemini 2.5 Flash Image Preview is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations.
Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality work flows.
DeepSeek-V3.1 is a 671B-parameter hybrid reasoning model (37B active), supporting both "thinking" and "non-thinking" modes via prompt templates. It extends DeepSeek-V3 with two-phase long-context training (up to 128K tokens) and uses FP8 microscaling for efficient inference. The model excels in tool use, code generation, and reasoning, with performance comparable to DeepSeek-R1 but with faster responses. It supports structured tool calling, code agents, and search agents, making it ideal for research and agentic workflows. Successor to DeepSeek V3-0324, it delivers strong performance across diverse tasks.
Qwen2.5 72B is the latest in the Qwen large language model series, offering significant improvements in knowledge, coding, and mathematics. It features specialized expert models, improved instruction following, long-text generation (over 8K tokens), structured data understanding, and robust multilingual support for over 29 languages. The model is optimized for resilience to diverse system prompts and enhanced role-play implementation.
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed for tasks combining visual and textual data. It excels at image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it is ideal for content creation, AI-driven customer service, and research.
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model, demonstrating highly competitive performance compared to leading proprietary models and consistently outperforming state-of-the-art open-source models. It is an instruct finetune of Mixtral 8x22B and is optimized for complex reasoning and instruction-following tasks. For more information, see the [official release](https://wizardlm.github.io/WizardLM2/).
A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.
GPT-5 Chat is tailored for advanced, natural, and context-aware conversations in enterprise environments. It leverages the latest advancements in OpenAI’s conversational AI, supporting multimodal and dynamic dialogue with enhanced context retention and user intent understanding.
The smallest and fastest member of the GPT-5 family, optimized for developer tools, rapid user interactions, and ultra-low latency environments. While it offers limited reasoning depth compared to larger models, GPT-5-Nano preserves essential instruction-following and safety mechanisms. It is the successor to GPT-4.1-nano and is best suited for real-time, cost-sensitive, or embedded applications.
A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.
OpenAI’s most advanced large language model, engineered for high-stakes applications requiring step-by-step reasoning, precise instruction following, and robust code generation. GPT-5 introduces major improvements in factual accuracy, user intent understanding, and hallucination reduction. It supports advanced prompt routing, user-specified intent (such as "think hard about this"), and is optimized for complex workflows in coding, writing, and health-related domains.
OpenAI’s 21B-parameter open-weight Mixture-of-Experts (MoE) model, released under the Apache 2.0 license. Features 3.6B active parameters per forward pass, optimized for low-latency inference and deployability on consumer or single-GPU hardware. Trained in OpenAI’s Harmony response format, it supports reasoning level configuration, fine-tuning, and agentic capabilities such as function calling and structured outputs.
An open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI, designed for high-reasoning, agentic, and general-purpose production use cases. Activates 5.1B parameters per forward pass and is optimized for single H100 GPU deployment with native MXFP4 quantization. Supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.
Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves high scores on SWE-bench Verified and excels in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for research, data analysis, and tool-assisted workflows.
Qwen-Image is a foundation image generation model from the Qwen team, excelling at high-fidelity text rendering, complex text integration (including English and Chinese), and diverse artistic styles. It supports advanced editing features such as style transfer, object manipulation, and human pose editing, and is suitable for both image generation and understanding tasks.
Flux-1-Krea-Dev is a 12B parameter rectified flow transformer developed by Black Forest Labs and Krea, focused on aesthetic photography and efficient, open-weight image generation. It leverages guidance distillation for efficient inference and is released with open weights for research and creative workflows.
Stable Diffusion 3 Large is the latest and most advanced addition to the Stable Diffusion family, featuring 8 billion parameters for intricate text understanding, typography, and highly detailed image generation. It is designed for creative and professional use cases requiring high fidelity and control.
Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. Activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, and is instruction-tuned for step-by-step reasoning, tool use, agentic workflows, and multilingual tasks.
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. Optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. Features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts), and supports variable pricing based on context length.
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. Optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. Supports a native 262K context length and delivers significant gains in knowledge coverage, long-context reasoning, and coding benchmarks.
Gemini 2.5 Flash-Lite is a streamlined reasoning model from the Gemini 2.5 family, designed for extremely low latency and cost-effectiveness. It delivers higher throughput, quicker token generation, and enhanced performance on standard benchmarks compared to previous Flash models.
Flux-1-Kontext-Max is a premium text-based image editing model from Black Forest Labs, delivering maximum performance and advanced typography generation for transforming images through natural language prompts. It is designed for high-end creative and professional use.
Flux-1-Kontext-Pro is a state-of-the-art text-based image editing model from Black Forest Labs, providing high-quality, prompt-adherent output for transforming images using natural language. It is optimized for consistent results and advanced editing tasks.
Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.
The April 2023 release of GPT-4 Turbo, supporting vision, JSON mode, and function calling. Trained on data up to April 2023, optimized for advanced multimodal tasks.
Gemini-Embedding-001 is Google’s top-ranked multilingual embedding model, supporting over 100 languages and flexible output dimensions (3072, 1536, or 768). It is optimized for semantic search, clustering, and recommendations, and leverages Matryoshka Representation Learning for efficient, high-quality embeddings.
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.
Grok 4 is xAI’s latest reasoning model, featuring a 256k context window and support for parallel tool calling, structured outputs, and both image and text inputs. Designed for high-throughput, complex reasoning tasks, with pricing that scales for large token requests.
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.
A lightweight, high-speed model from xAI, engineered for logic-based tasks that do not require deep domain knowledge. Grok-3-mini is optimized for rapid response and efficient reasoning, making it ideal for applications where speed and concise logic are prioritized over extensive context or specialized expertise. The model exposes raw thinking traces, providing transparency into its decision-making process and enabling advanced debugging or educational use cases.
MiniMax-M1 is a large-scale, open-weight reasoning model with 456B total parameters and 45.9B active per token, leveraging a hybrid Mixture-of-Experts (MoE) architecture and a custom "lightning attention" mechanism. It supports context windows up to 1 million tokens and is optimized for long-context understanding, software engineering, agentic tool use, and mathematical reasoning. The model is trained via a custom reinforcement learning pipeline (CISPO) and demonstrates strong performance on FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench.
Grok 3 is xAI’s flagship model, excelling at enterprise use cases such as data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science, and is optimized for high-accuracy, real-world applications.
Gemini 2.0 Flash Lite is optimized for extremely fast response times and low cost, while maintaining the quality of larger models. Ideal for real-time and large-scale applications.
Eleven-Multilingual-v2 is ElevenLabs’ most advanced multilingual text-to-speech model, delivering high-quality voice synthesis across a wide range of languages with improved realism and expressiveness. It is optimized for both accuracy and naturalness in multilingual scenarios.
Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.
Gemini 2.5 Pro is Google’s state-of-the-art AI model, designed for advanced reasoning, coding, mathematics, and scientific tasks. Employs “thinking” capabilities for nuanced context handling and achieves top-tier performance on multiple benchmarks, including first-place on the LMArena leaderboard.
Sonar Reasoning is a Perplexity model based on DeepSeek R1, designed for long chain-of-thought reasoning with built-in web search. It is uncensored, hosted in US datacenters, and allows developers to leverage extended reasoning for complex queries, making it suitable for research and knowledge-intensive applications.
DALL-E 3 is OpenAI’s third-generation text-to-image model, offering enhanced detail, accuracy, and the ability to understand complex prompts. It excels at generating realistic and creative images, handling intricate details like text and human anatomy, and supports various aspect ratios for flexible output.
Magistral is Mistral's first reasoning model, designed for general-purpose use cases that require extended thought processing and high accuracy. It excels in multi-step challenges such as legal research, financial forecasting, software development, and creative storytelling, where transparency and precision are critical.
A text-to-speech model built on GPT-4o mini, a fast and powerful language model. Use it to convert text into natural-sounding spoken audio.
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models, while operating at three times the speed on equivalent hardware.
Imagen-4 is Google’s latest text-to-image model, engineered for photorealistic quality, improved fine details, advanced spelling and typography rendering, and high accuracy across diverse art styles. It includes SynthID watermarking for AI-generated content identification and is benchmarked as a leader in human preference evaluations.
The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. Trained with large-scale reinforcement learning for chain-of-thought reasoning, it is optimized for math, science, programming, and other STEM tasks, consistently achieving PhD-level accuracy on industry benchmarks.
Scribe-v1 is a cutting-edge speech recognition model from ElevenLabs, designed for accurate speech-to-text transcription in 99 languages. It excels at handling real-world audio and consistently outperforms models such as Gemini 2.0 Flash and Whisper Large V3, achieving notably low word error rates even in underserved languages.
Eleven-v3 is ElevenLabs’ most expressive text-to-speech model, supporting over 70 languages, multi-speaker dialogue, and advanced audio tags such as [excited], [whispers], [laughs], and [sighs]. It provides unmatched realism and control, enabling dynamic, context-aware conversations with improved expressiveness and fine-grained audio control.
Mistral Saba is a 24B-parameter language model specifically developed for the Middle East and South Asia. It delivers accurate and contextually relevant responses in multiple Indian-origin languages—including Tamil and Malayalam—alongside Arabic. The model is trained on curated regional datasets and is optimized for multilingual and regional applications.
Google’s latest open-source multimodal model, Gemma 3 27B, supports vision-language input and text outputs, handles context windows up to 128k tokens, and understands over 140 languages. Offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.
Flux-1-Schnell is a high-speed, open-source text-to-image model from Black Forest Labs, optimized for rapid, high-quality image generation in just a few steps. It is ideal for applications where speed and efficiency are critical.
Qwen-Turbo is a 1M context model based on Qwen2.5, designed for fast speed and low cost. It is suitable for simple tasks and applications where efficiency and affordability are prioritized over deep reasoning.
GPT-4.1, a flagship model for advanced instruction following, software engineering, and long-context reasoning. Supports a 1 million token context window and is tuned for precise code diffs, agent reliability, and high recall in large document contexts.
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model from Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding.
GPT-4o (“o” for “omni”) is OpenAI’s latest multimodal model, supporting both text and image inputs with text outputs. Delivers improved performance in non-English languages and visual understanding, while being faster and more cost-effective than previous models.
Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. Supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. Demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects.
Qwen-Max is a large-scale Mixture-of-Experts (MoE) model from Qwen, based on Qwen2.5, and provides the best inference performance among Qwen models, especially for complex multi-step tasks. Pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), it is designed for high-accuracy, high-recall applications. The exact parameter count is undisclosed.
Meta’s Llama 3 70B instruct-tuned model, optimized for high-quality dialogue and demonstrating strong performance in human evaluations. Suitable for advanced conversational AI tasks.
Llama 3.2 3B is a 3-billion-parameter multilingual model optimized for advanced NLP tasks such as dialogue generation, reasoning, and summarization. It supports eight languages and is trained on 9 trillion tokens, excelling in instruction-following, complex reasoning, and tool use.
The latest GPT-4 Turbo model with vision capabilities, supporting JSON mode and function calling. Trained on data up to December 2023, it is optimized for high-throughput, multimodal applications.
Llama 3.2 1B is a 1-billion-parameter language model focused on efficient natural language tasks, including summarization, dialogue, and multilingual text analysis. Its small size allows for deployment in low-resource environments while maintaining strong performance across eight core languages.
Preview release of GPT-4, featuring improved instruction following, JSON mode, reproducible outputs, and parallel function calling. Trained on data up to December 2023. Heavily rate-limited while in preview.
The August 2024 version of GPT-4o, offering improved structured output capabilities, including support for JSON schema in responses. Maintains high intelligence and efficiency, with enhanced non-English and visual performance.
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Features early fusion for native multimodality and a 1 million token context window.
Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. Supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. Fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects.
Sonar Pro is an enterprise-grade API from Perplexity, built for advanced, multi-step queries with added extensibility. It can handle longer and more nuanced searches, follow-up questions, and provides double the number of citations per search compared to the standard Sonar model. The model is optimized for large context windows and comprehensive information retrieval.
DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from DeepSeek-Prover-V1.5, but released without an official announcement or detailed documentation.
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction-tuned generative model with 70B parameters. Optimized for multilingual dialogue, it outperforms many open-source and closed chat models on industry benchmarks. Supported languages include English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Kandinsky-3.1 is a large text-to-image diffusion model developed by Sber and AIRI, featuring 11.9 billion parameters. The model consists of a text encoder, U-Net, and decoder, enabling high-quality, detailed image generation from text prompts. It is trained on extensive datasets and is designed for both creative and scientific applications.
Recraft-v3 is a state-of-the-art text-to-image model from Recraft, capable of generating images from long textual inputs in a wide range of styles. It is benchmarked as a leader in image generation and is designed for creative and professional applications.
Grok-2-Aurora is an autoregressive, mixture-of-experts model from xAI, trained on billions of text and image examples. It excels at photorealistic rendering, accurately following text instructions, and complex scene generation, leveraging deep world understanding built during training.
Omni-Moderation is OpenAI’s newest multimodal content moderation model, available through the Moderation API. It is designed to identify potentially harmful content in both text and images, offering improved accuracy and granular control, especially in non-English languages.
DeepSeek V3 is a 685B-parameter, mixture-of-experts model and the latest iteration of the flagship chat model family from the DeepSeek team. Succeeds the previous DeepSeek V3 model and demonstrates strong performance across a variety of tasks.
DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1, utilizing more compute and advanced post-training techniques to push its reasoning and inference capabilities to the level of flagship models like O3 and Gemini 2.5 Pro. Excels in math, programming, and logic leaderboards, with a distilled 8B-parameter variant that rivals much larger models on key benchmarks.
Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.
Mistral Large 2 (version mistral-large-2407) is Mistral AI’s flagship model, supporting dozens of languages—including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean—and over 80 coding languages. It features a long context window for precise information recall and is optimized for reasoning, code, JSON, and chat tasks.
A fine-tuned version of o4-mini, specifically optimized for use in Codex CLI. Recommended for code-related tasks, with improved performance in code generation and completion.
Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. It is capable of understanding documents, charts, and natural images, and is available under both research and commercial licenses. The model is designed for advanced document and image analysis tasks.
Sonar Reasoning is a Perplexity model based on DeepSeek R1, designed for long chain-of-thought reasoning with built-in web search. It is uncensored, hosted in US datacenters, and allows developers to leverage extended reasoning for complex queries, making it suitable for research and knowledge-intensive applications.
GPT-4o (“o” for “omni”) is OpenAI’s latest multimodal model, supporting both text and image inputs with text outputs. Delivers improved performance in non-English languages and visual understanding, while being faster and more cost-effective than previous models.
The August 2024 version of GPT-4o, offering improved structured output capabilities, including support for JSON schema in responses. Maintains high intelligence and efficiency, with enhanced non-English and visual performance.
Eleven-Monolingual-v1 is an English-only TTS model from ElevenLabs, providing clear, natural-sounding voice output for a variety of English-language applications.
A speech-to-text model using GPT-4o for transcribing audio. It offers improved word error rate, better language recognition, and higher accuracy compared to the original Whisper models. Use it for more precise transcripts.
Midjourney is a generative AI model developed by Midjourney, Inc., designed to create images from text descriptions (prompts). It is widely used for creative and design purposes, offering high-quality, imaginative visuals for a variety of applications.
Stable Diffusion 3.5 Large is a powerful, text-to-image AI model from Stability AI, utilizing a Multimodal Diffusion Transformer (MMDiT) architecture with 8.1 billion parameters. It excels at generating high-resolution images (up to 1 megapixel) in diverse styles, with strong prompt adherence and advanced detail rendering.
A well-rounded, powerful model from OpenAI, setting new standards in math, science, coding, and visual reasoning. Excels at technical writing and instruction-following, and is designed for multi-step problem solving across text, code, and images. BYOK is required for access.
Whisper Large v3 is OpenAI’s state-of-the-art model for automatic speech recognition (ASR) and speech translation. Trained on over 5 million hours of labeled data, it demonstrates strong generalization across datasets and domains, excelling in zero-shot transcription and translation tasks.
Meta’s Llama 3 8B instruct-tuned model, optimized for high-quality dialogue and demonstrating strong performance in human evaluations. Ideal for efficient conversational AI.
Meta’s Llama 3.1 8B instruct-tuned model, designed for fast and efficient dialogue. It performs strongly in human evaluations and is ideal for applications requiring a balance of speed and quality.
Meta’s Llama 3.1 70B instruct-tuned model, optimized for high-quality dialogue use cases. It demonstrates strong performance in human evaluations and is suitable for a wide range of conversational AI applications.
xAI’s Grok 2 Vision 1212 is a next-generation vision-language model designed for advanced image-based AI applications. It features robust visual comprehension, refined instruction-following, and strong multilingual support. The model excels at object recognition, style analysis, and visual reasoning, empowering developers to build intuitive, visually aware applications. Enhanced steerability and reasoning capabilities make it a solid foundation for next-generation image solutions. For more details, see the official [xAI announcement](https://x.ai/blog/grok-1212).
Flux-1.1-Pro is an enhanced version of Flux 1.0 Pro from Black Forest Labs, offering faster generation speeds, improved image quality, and better prompt adherence. It is optimized for both developer and commercial use.
Flux-1-Dev is an open-weight, non-commercial text-to-image model from Black Forest Labs, designed for high-quality image generation with a 12B parameter rectified flow transformer. It is optimized for research and creative experimentation.
Ideogram-v2-turbo is the latest image generation model from Ideogram, designed for fast production of realistic visuals, graphic designs, and typography. It combines rapid image generation with high quality, making it ideal for posters, logos, and creative content.
Sonar is Perplexity’s lightweight, affordable, and fast question-answering model, now featuring citations and customizable sources. It is designed for companies seeking to integrate rapid, citation-enabled Q&A features optimized for speed and simplicity.
Text-Embedding-3-Small is OpenAI’s efficient, compact embedding model, designed to convert text into numerical representations for semantic tasks such as search, clustering, and recommendations. It offers improved performance and cost-effectiveness compared to previous models, with low latency and storage requirements.
Sonar Reasoning Pro is Perplexity’s premier reasoning model, powered by DeepSeek R1 with Chain of Thought (CoT) capabilities. Designed for advanced use cases, it supports in-depth, multi-step queries with a larger context window and can surface more citations per search, enabling more comprehensive and extensible responses. Pricing includes Perplexity search costs for integrated web research.
Stable Diffusion XL (SDXL) is a powerful text-to-image generation model from Stability AI, featuring a 3x larger UNet, dual text encoders (OpenCLIP ViT-bigG/14 and the original), and a two-stage process for generating highly detailed, controllable images. It introduces size and crop-conditioning for greater control and quality in image generation.
Specialized GPT-4o variant trained for web search understanding and execution within chat completions, enabling advanced search query comprehension.
Kandinsky-3.1 is a large text-to-image diffusion model developed by Sber and AIRI, featuring 11.9 billion parameters. The model consists of a text encoder, U-Net, and decoder, enabling high-quality, detailed image generation from text prompts. It is trained on extensive datasets and is designed for both creative and scientific applications.
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. Supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. Demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities.
Mythomax L2 13B is one of the highest performing and most popular fine-tunes of Llama 2 13B, known for its rich descriptive capabilities and roleplay performance. It is widely used in creative and narrative-driven applications.
Text-Embedding-3-Large is OpenAI’s most capable embedding model, supporting both English and non-English text tasks. It produces high-dimensional embeddings (up to 3072 dimensions) for advanced semantic similarity, search, and clustering, and allows flexible trade-offs between performance and resource usage.
Claude 3 Opus is Anthropic's most powerful model for highly complex tasks, offering top-level performance, intelligence, fluency, and understanding. It is optimized for advanced research, coding, and multimodal applications, and is benchmarked as a leader in the Claude 3 family.
Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.
Sonar Deep Research is a research-focused model from Perplexity, engineered for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers information. This enables comprehensive report generation in domains like finance, technology, health, and current events. The model’s pricing is based on prompt tokens, citation tokens, and the number of searches and reasoning tokens used during its extensive research phase.
Stable Diffusion XL (SDXL) is a powerful text-to-image generation model from Stability AI, featuring a 3x larger UNet, dual text encoders (OpenCLIP ViT-bigG/14 and the original), and a two-stage process for generating highly detailed, controllable images. It introduces size and crop-conditioning for greater control and quality in image generation.
Flux-1.1-Pro-Ultra is a high-resolution, high-speed image generation model from Black Forest Labs, capable of producing images up to 4 million pixels (4MP). It is designed for professional printing, fine art, and applications requiring exceptional detail and speed.
The highly anticipated 400B class of Llama3 is here, offering a 128k context window and impressive evaluation scores. This 405B instruct-tuned version is optimized for high-quality dialogue and demonstrates strong performance compared to leading closed-source models, including GPT-4o and Claude 3.5 Sonnet.
Flux-1-Pro is an advanced text-to-image model from Black Forest Labs, generating high-quality, realistic images and clear text. It is suitable for a wide range of applications, including commercial and creative projects.
GPT-4.1, a flagship model for advanced instruction following, software engineering, and long-context reasoning. Supports a 1 million token context window and is tuned for precise code diffs, agent reliability, and high recall in large document contexts.
Eleven-Turbo-v2 is an English-optimized TTS model from ElevenLabs, designed for fast, high-quality speech synthesis with low latency. It is ideal for real-time applications and interactive voice systems.
Whisper Large v3 is OpenAI’s state-of-the-art model for automatic speech recognition (ASR) and speech translation. Trained on over 5 million hours of labeled data, it demonstrates strong generalization across datasets and domains, excelling in zero-shot transcription and translation tasks.
A compact reasoning model in OpenAI’s o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. Supports tool use and demonstrates competitive reasoning and coding performance across benchmarks, outperforming its predecessor o3-mini and approaching o3 in some domains. Well-suited for high-throughput scenarios where latency or cost is critical.
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model from Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding.
Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.
Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves high scores on SWE-bench Verified and excels in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for research, data analysis, and tool-assisted workflows.
The continually updated version of OpenAI ChatGPT 4o, always pointing to the current GPT-4o model used by ChatGPT. Incorporates additional RLHF and may differ from the API version. Intended for research and evaluation, not recommended for production as it may be redirected or removed in the future.
The continually updated version of OpenAI ChatGPT 4o, always pointing to the current GPT-4o model used by ChatGPT. Incorporates additional RLHF and may differ from the API version. Intended for research and evaluation, not recommended for production as it may be redirected or removed in the future.
DALL-E 3 is OpenAI’s third-generation text-to-image model, offering enhanced detail, accuracy, and the ability to understand complex prompts. It excels at generating realistic and creative images, handling intricate details like text and human anatomy, and supports various aspect ratios for flexible output.
Phi-4 is a 14B-parameter model from Microsoft Research, designed for complex reasoning tasks and efficient operation in low-memory or rapid-response scenarios. Trained on a mix of high-quality synthetic and curated data, it is optimized for English language inputs and demonstrates strong instruction following and safety standards. For more details, see the [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905).
The fastest and most cost-effective model in the GPT-4.1 series, designed for tasks demanding low latency such as classification and autocompletion. Maintains a 1 million token context window and delivers exceptional performance at a small size, outperforming even some larger models on key benchmarks.
A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.
A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.
The November 2024 release of GPT-4o, featuring enhanced creative writing, more natural and engaging responses, and improved file handling. Maintains the intelligence of GPT-4 Turbo while being twice as fast and 50% more cost-effective, with better support for non-English languages and visual tasks.
Mistral Medium 3 is a high-performance, enterprise-grade language model that balances state-of-the-art reasoning and multimodal capabilities with significantly reduced operational cost. It excels in coding, STEM reasoning, and enterprise adaptation, and is optimized for scalable deployments across professional and industrial use cases, including hybrid and on-prem environments.
Claude 3.5 Haiku is engineered for real-time applications, delivering quick response times and enhanced capabilities in speed, coding accuracy, and tool use. It is highly suitable for dynamic environments such as chat interactions, immediate coding suggestions, and customer service bots. The model is currently pointing to Claude 3.5 Haiku (2024-10-22).
Claude Sonnet 4 is a next-generation model from Anthropic, significantly enhancing coding and reasoning capabilities over its predecessor. It achieves state-of-the-art performance on SWE-bench, balances capability and computational efficiency, and is optimized for both routine and complex software development projects. Key features include improved codebase navigation, reduced error rates, and increased reliability in following intricate instructions.
Llama Guard 4 is a multimodal content safety classifier derived from Llama 4 Scout, fine-tuned for both prompt and response classification. It supports content moderation for English and multiple languages, including mixed text-and-image prompts. The model is aligned with the MLCommons hazards taxonomy and is integrated into the Llama Moderations API for robust safety classification in text and images.
A cost-efficient language model from OpenAI, optimized for STEM reasoning tasks, especially in science, mathematics, and coding. Supports the `reasoning_effort` parameter for adjustable thinking time and features significant improvements over its predecessor, with better performance on complex questions and lower latency and cost.
OpenAI’s most advanced small model, GPT-4o mini, supports both text and image inputs with text outputs. It is highly cost-effective, achieving SOTA intelligence and outperforming larger models on key benchmarks, making it ideal for scalable, interactive applications.
Mistral-Small-3.2-24B-Instruct-2503 is an updated 24B parameter model from Mistral, optimized for instruction following, repetition reduction, and improved function calling. It supports both image and text inputs, delivers strong performance across coding, STEM, and vision benchmarks, and is designed for efficient, structured output generation.
OpenAI’s new state-of-the-art image generation model. This is a natively multimodal language model that accepts both text and image inputs and produces image outputs. It powers image generation in ChatGPT, offering exceptional prompt adherence, a high level of detail, and quality.
Flux-1-Schnell is a high-speed, open-source text-to-image model from Black Forest Labs, optimized for rapid, high-quality image generation in just a few steps. It is ideal for applications where speed and efficiency are critical.
Command A is an open-weights 111B parameter model from Cohere, featuring a 256k context window and optimized for agentic, multilingual, and coding use cases. It delivers high performance with minimal hardware costs, excelling in business-critical workflows that require advanced reasoning, tool use, and language understanding across multiple languages.
Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.
QwQ-32B is the medium-sized reasoning model in the Qwen series, designed for advanced thinking and reasoning tasks. It achieves competitive performance against state-of-the-art models like DeepSeek-R1 and o1-mini, and is particularly strong on hard problems requiring deep analytical skills.
Codestral is Mistral’s cutting-edge language model for coding, specializing in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction, and test generation. It is optimized for developer productivity and supports a wide range of programming languages and code-related tasks.
Mistral Large 2 2411 is an updated release of Mistral Large 2, featuring notable improvements in long context understanding, a new system prompt, and more accurate function calling. It is designed for advanced enterprise and research applications requiring high reliability and performance.
Claude 3.7 Sonnet is an advanced large language model from Anthropic, featuring improved reasoning, coding, and problem-solving abilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model excels in agentic workflows, front-end development, and full-stack updates, and offers an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following.
OpenAI’s most advanced small model, GPT-4o mini, supports both text and image inputs with text outputs. It is highly cost-effective, achieving SOTA intelligence and outperforming larger models on key benchmarks, making it ideal for scalable, interactive applications.
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction-tuned generative model with 70B parameters. Optimized for multilingual dialogue, it outperforms many open-source and closed chat models on industry benchmarks. Supported languages include English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
The November 2024 release of GPT-4o, featuring enhanced creative writing, more natural and engaging responses, and improved file handling. Maintains the intelligence of GPT-4 Turbo while being twice as fast and 50% more cost-effective, with better support for non-English languages and visual tasks.
Eleven-Multilingual-v2 is ElevenLabs’ most advanced multilingual text-to-speech model, delivering high-quality voice synthesis across a wide range of languages with improved realism and expressiveness. It is optimized for both accuracy and naturalness in multilingual scenarios.
A text-to-speech model built on GPT-4o mini, a fast and powerful language model. Use it to convert text into natural-sounding spoken audio.
Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.
Sonar is Perplexity’s lightweight, affordable, and fast question-answering model, now featuring citations and customizable sources. It is designed for companies seeking to integrate rapid, citation-enabled Q&A features optimized for speed and simplicity.
A flagship large language model from OpenAI, optimized for advanced instruction following, real-world software engineering, and long-context reasoning. Supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 in coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding. Tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.
Text-Embedding-Ada-002 is a widely used text embedding model from OpenAI, converting text into semantic vectors for tasks like search, clustering, recommendations, and classification. It is known for strong performance and efficiency, making it a standard choice for embedding applications.
Eleven-Multilingual-v1 is an earlier multilingual TTS model from ElevenLabs, offering robust support for multiple languages and reliable, natural-sounding voice generation.