Models | NagaAI

Dashboard Playground Models Docs

Models

Explore a wide range of AI models available through the NagaAI platform.

gemini-2.5-flash-image-preview

Gemini 2.5 Flash Image Preview is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations.

$0.15/1M input tokens$1.25/1M output tokens

gpt-5-chat-latest

GPT-5 Chat is tailored for advanced, natural, and context-aware conversations in enterprise environments. It leverages the latest advancements in OpenAI’s conversational AI, supporting multimodal and dynamic dialogue with enhanced context retention and user intent understanding.

$0.63/1M input tokens$5.00/1M output tokens

gpt-5-nano-2025-08-07

The smallest and fastest member of the GPT-5 family, optimized for developer tools, rapid user interactions, and ultra-low latency environments. While it offers limited reasoning depth compared to larger models, GPT-5-Nano preserves essential instruction-following and safety mechanisms. It is the successor to GPT-4.1-nano and is best suited for real-time, cost-sensitive, or embedded applications.

$0.02/1M input tokens$0.20/1M output tokens

gpt-5-mini-2025-08-07

A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.

$0.13/1M input tokens$1.00/1M output tokens

gpt-5-2025-08-07

OpenAI’s most advanced large language model, engineered for high-stakes applications requiring step-by-step reasoning, precise instruction following, and robust code generation. GPT-5 introduces major improvements in factual accuracy, user intent understanding, and hallucination reduction. It supports advanced prompt routing, user-specified intent (such as "think hard about this"), and is optimized for complex workflows in coding, writing, and health-related domains.

$0.63/1M input tokens$5.00/1M output tokens

claude-opus-4.1-20250805

Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves high scores on SWE-bench Verified and excels in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for research, data analysis, and tool-assisted workflows.

$10.00/1M input tokens$50.00/1M output tokens

qwen-image

Qwen-Image is a foundation image generation model from the Qwen team, excelling at high-fidelity text rendering, complex text integration (including English and Chinese), and diverse artistic styles. It supports advanced editing features such as style transfer, object manipulation, and human pose editing, and is suitable for both image generation and understanding tasks.

flux-1-kontext-max

Flux-1-Kontext-Max is a premium text-based image editing model from Black Forest Labs, delivering maximum performance and advanced typography generation for transforming images through natural language prompts. It is designed for high-end creative and professional use.

byblack-forest-labs

flux-1-kontext-pro

Flux-1-Kontext-Pro is a state-of-the-art text-based image editing model from Black Forest Labs, providing high-quality, prompt-adherent output for transforming images using natural language. It is optimized for consistent results and advanced editing tasks.

byblack-forest-labs

grok-4-0709

Grok 4 is xAI’s latest reasoning model, featuring a 256k context window and support for parallel tool calling, structured outputs, and both image and text inputs. Designed for high-throughput, complex reasoning tasks, with pricing that scales for large token requests.

$2.25/1M input tokens$11.25/1M output tokens

gemini-2.5-flash

Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.

$0.15/1M input tokens$1.25/1M output tokens

gemini-2.5-pro

Gemini 2.5 Pro is Google’s state-of-the-art AI model, designed for advanced reasoning, coding, mathematics, and scientific tasks. Employs “thinking” capabilities for nuanced context handling and achieves top-tier performance on multiple benchmarks, including first-place on the LMArena leaderboard.

$0.94/1M input tokens$6.25/1M output tokens

omni-moderation-2024-09-26

Omni-Moderation is OpenAI’s newest multimodal content moderation model, available through the Moderation API. It is designed to identify potentially harmful content in both text and images, offering improved accuracy and granular control, especially in non-English languages.

mistral-moderation-2411

Mistral Moderation 2411 is a content moderation model from Mistral, offering high-accuracy text moderation across nine safety categories and multiple languages. It is designed for robust, real-time moderation in diverse environments.

$0.05/1M tokens

gpt-image-1

OpenAI’s new state-of-the-art image generation model. This is a natively multimodal language model that accepts both text and image inputs and produces image outputs. It powers image generation in ChatGPT, offering exceptional prompt adherence, a high level of detail, and quality.

claude-3.5-haiku-20241022

Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.

$0.40/1M input tokens$2.00/1M output tokens

claude-3.5-sonnet-20241022

Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.

$1.50/1M input tokens$7.50/1M output tokens

claude-3.7-sonnet-20250219

Claude 3.7 Sonnet is an advanced large language model from Anthropic, featuring improved reasoning, coding, and problem-solving abilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model excels in agentic workflows, front-end development, and full-stack updates, and offers an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following.

$1.50/1M input tokens$7.50/1M output tokens

claude-sonnet-4-20250514

Claude Sonnet 4 is a next-generation model from Anthropic, significantly enhancing coding and reasoning capabilities over its predecessor. It achieves state-of-the-art performance on SWE-bench, balances capability and computational efficiency, and is optimized for both routine and complex software development projects. Key features include improved codebase navigation, reduced error rates, and increased reliability in following intricate instructions.

$1.50/1M input tokens$7.50/1M output tokens

claude-opus-4-20250514

Claude Opus 4.1 is Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves high scores on SWE-bench Verified and excels in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for research, data analysis, and tool-assisted workflows.

$10.00/1M input tokens$50.00/1M output tokens

mistral-small-2503

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral, optimized for instruction following, repetition reduction, and improved function calling. It supports both image and text inputs, delivers strong performance across coding, STEM, and vision benchmarks, and is designed for efficient, structured output generation.

$0.05/1M input tokens$0.15/1M output tokens

mistral-medium-2505

Mistral Medium 3 is a high-performance, enterprise-grade language model that balances state-of-the-art reasoning and multimodal capabilities with significantly reduced operational cost. It excels in coding, STEM reasoning, and enterprise adaptation, and is optimized for scalable deployments across professional and industrial use cases, including hybrid and on-prem environments.

$0.20/1M input tokens$1.00/1M output tokens

pixtral-large-2411

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. It is capable of understanding documents, charts, and natural images, and is available under both research and commercial licenses. The model is designed for advanced document and image analysis tasks.

$1.00/1M input tokens$3.00/1M output tokens

llama-3.2-11b-vision-instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed for tasks combining visual and textual data. It excels at image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it is ideal for content creation, AI-driven customer service, and research.

$0.10/1M input tokens$0.10/1M output tokens

llama-4-scout-17b-16e-instruct

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model from Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding.

$0.24/1M input tokens$0.96/1M output tokens

llama-4-maverick-17b-128e-instruct

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Features early fusion for native multimodality and a 1 million token context window.

$0.28/1M input tokens$1.10/1M output tokens

llama-guard-4-12b

Llama Guard 4 is a multimodal content safety classifier derived from Llama 4 Scout, fine-tuned for both prompt and response classification. It supports content moderation for English and multiple languages, including mixed text-and-image prompts. The model is aligned with the MLCommons hazards taxonomy and is integrated into the Llama Moderations API for robust safety classification in text and images.

$0.02/1M input tokens$0.02/1M output tokens

grok-2-vision-1212

xAI’s Grok 2 Vision 1212 is a next-generation vision-language model designed for advanced image-based AI applications. It features robust visual comprehension, refined instruction-following, and strong multilingual support. The model excels at object recognition, style analysis, and visual reasoning, empowering developers to build intuitive, visually aware applications. Enhanced steerability and reasoning capabilities make it a solid foundation for next-generation image solutions. For more details, see the official [xAI announcement](https://x.ai/blog/grok-1212).

$1.00/1M input tokens$5.00/1M output tokens

gemini-1.5-flash-002

Gemini Flash 1.5 8B is optimized for speed and efficiency, delivering enhanced performance in small prompt tasks such as chat, transcription, and translation. Focuses on cost-effective solutions while maintaining high-quality results, making it suitable for real-time and large-scale operations.

$0.06/1M input tokens$0.22/1M output tokens

gemini-1.5-pro-002

Google’s latest multimodal model, supporting both image and video (where available) in text or chat prompts. Optimized for a wide range of language tasks, including code generation, text editing, problem solving, recommendations, and AI agent workflows.

$0.94/1M input tokens$3.75/1M output tokens

gemini-2.0-flash-lite-001

Gemini 2.0 Flash Lite is optimized for extremely fast response times and low cost, while maintaining the quality of larger models. Ideal for real-time and large-scale applications.

$0.04/1M input tokens$0.15/1M output tokens

gemini-2.0-flash-001

Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.

$0.05/1M input tokens$0.20/1M output tokens

gemma-3-27b-it

Google’s latest open-source multimodal model, Gemma 3 27B, supports vision-language input and text outputs, handles context windows up to 128k tokens, and understands over 140 languages. Offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.

$0.05/1M input tokens$0.10/1M output tokens

gpt-4-turbo-2024-04-09

The latest GPT-4 Turbo model with vision capabilities, supporting JSON mode and function calling. Trained on data up to December 2023, it is optimized for high-throughput, multimodal applications.

$5.00/1M input tokens$15.00/1M output tokens

gpt-4o-mini-2024-07-18

OpenAI’s most advanced small model, GPT-4o mini, supports both text and image inputs with text outputs. It is highly cost-effective, achieving SOTA intelligence and outperforming larger models on key benchmarks, making it ideal for scalable, interactive applications.

$0.07/1M input tokens$0.30/1M output tokens

gpt-4o-2024-05-13

GPT-4o (“o” for “omni”) is OpenAI’s latest multimodal model, supporting both text and image inputs with text outputs. Delivers improved performance in non-English languages and visual understanding, while being faster and more cost-effective than previous models.

$2.50/1M input tokens$7.50/1M output tokens

gpt-4o-2024-08-06

The August 2024 version of GPT-4o, offering improved structured output capabilities, including support for JSON schema in responses. Maintains high intelligence and efficiency, with enhanced non-English and visual performance.

$1.25/1M input tokens$5.00/1M output tokens

gpt-4o-2024-11-20

The November 2024 release of GPT-4o, featuring enhanced creative writing, more natural and engaging responses, and improved file handling. Maintains the intelligence of GPT-4 Turbo while being twice as fast and 50% more cost-effective, with better support for non-English languages and visual tasks.

$1.25/1M input tokens$5.00/1M output tokens

gpt-4o-search-preview-2025-03-11

Specialized GPT-4o variant trained for web search understanding and execution within chat completions, enabling advanced search query comprehension.

$1.25/1M input tokens$5.00/1M output tokens

o1-2024-12-17

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. Trained with large-scale reinforcement learning for chain-of-thought reasoning, it is optimized for math, science, programming, and other STEM tasks, consistently achieving PhD-level accuracy on industry benchmarks.

$7.50/1M input tokens$30.00/1M output tokens

o3-2025-04-16

A well-rounded, powerful model from OpenAI, setting new standards in math, science, coding, and visual reasoning. Excels at technical writing and instruction-following, and is designed for multi-step problem solving across text, code, and images. BYOK is required for access.

$1.00/1M input tokens$4.00/1M output tokens

o4-mini-2025-04-16

A compact reasoning model in OpenAI’s o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. Supports tool use and demonstrates competitive reasoning and coding performance across benchmarks, outperforming its predecessor o3-mini and approaching o3 in some domains. Well-suited for high-throughput scenarios where latency or cost is critical.

$0.55/1M input tokens$2.20/1M output tokens

chatgpt-4o-latest

The continually updated version of OpenAI ChatGPT 4o, always pointing to the current GPT-4o model used by ChatGPT. Incorporates additional RLHF and may differ from the API version. Intended for research and evaluation, not recommended for production as it may be redirected or removed in the future.

$2.50/1M input tokens$7.50/1M output tokens

gpt-4.1-nano-2025-04-14

The fastest and most cost-effective model in the GPT-4.1 series, designed for tasks demanding low latency such as classification and autocompletion. Maintains a 1 million token context window and delivers exceptional performance at a small size, outperforming even some larger models on key benchmarks.

$0.05/1M input tokens$0.20/1M output tokens

gpt-4.1-mini-2025-04-14

A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.

$0.20/1M input tokens$0.80/1M output tokens

gpt-4.1-2025-04-14

A flagship large language model from OpenAI, optimized for advanced instruction following, real-world software engineering, and long-context reasoning. Supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 in coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding. Tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

$1.00/1M input tokens$4.00/1M output tokens

codex-mini

A fine-tuned version of o4-mini, specifically optimized for use in Codex CLI. Recommended for code-related tasks, with improved performance in code generation and completion.

$0.75/1M input tokens$3.00/1M output tokens

Categories

Input Modalities

Output Modalities

Tiers

47 models

Sort

Sort

47 models

Models | NagaAI