Models
Explore a broad selection of AI models available on the NagaAI platform.
Explore a broad selection of AI models available on the NagaAI platform.
DeepSeek-TNG-R1T2-Chimera is TNG Tech's second-generation Chimera text-generation model. Built from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints using Assembly-of-Experts merging, this 671B-parameter model combines strengths from all three. Its tri-parent design delivers strong reasoning ability while being about 20% faster than the original R1 and over twice as fast as R1-0528 on vLLM, providing a great balance of cost and performance. The model supports up to 60k-token input (tested up to ~130k) and stable <think> token behavior, making it ideal for long-context analysis, dialogue, and general text generation.
A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.
Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.
Eleven-Multilingual-v2 is ElevenLabs’ most advanced multilingual text-to-speech model, delivering high-quality voice synthesis across a wide range of languages with improved realism and expressiveness. It is optimized for both accuracy and naturalness in multilingual scenarios.
DALL-E 3 is OpenAI’s third-generation text-to-image model, offering enhanced detail, accuracy, and the ability to understand complex prompts. It excels at generating realistic and creative images, handling intricate details like text and human anatomy, and supports various aspect ratios for flexible output.
A text-to-speech model built on GPT-4o mini, a fast and powerful language model. Use it to convert text into natural-sounding spoken audio.
Flux-1-Schnell is a high-speed, open-source text-to-image model from Black Forest Labs, optimized for rapid, high-quality image generation in just a few steps. It is ideal for applications where speed and efficiency are critical.
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction-tuned generative model with 70B parameters. Optimized for multilingual dialogue, it outperforms many open-source and closed chat models on industry benchmarks. Supported languages include English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Sonar Reasoning is a Perplexity model based on DeepSeek R1, designed for long chain-of-thought reasoning with built-in web search. It is uncensored, hosted in US datacenters, and allows developers to leverage extended reasoning for complex queries, making it suitable for research and knowledge-intensive applications.
GPT-4o (“o” for “omni”) is OpenAI’s latest multimodal model, supporting both text and image inputs with text outputs. Delivers improved performance in non-English languages and visual understanding, while being faster and more cost-effective than previous models.
The August 2024 version of GPT-4o, offering improved structured output capabilities, including support for JSON schema in responses. Maintains high intelligence and efficiency, with enhanced non-English and visual performance.
Whisper Large v3 is OpenAI’s state-of-the-art model for automatic speech recognition (ASR) and speech translation. Trained on over 5 million hours of labeled data, it demonstrates strong generalization across datasets and domains, excelling in zero-shot transcription and translation tasks.
Sonar is Perplexity’s lightweight, affordable, and fast question-answering model, now featuring citations and customizable sources. It is designed for companies seeking to integrate rapid, citation-enabled Q&A features optimized for speed and simplicity.
Kandinsky-3.1 is a large text-to-image diffusion model developed by Sber and AIRI, featuring 11.9 billion parameters. The model consists of a text encoder, U-Net, and decoder, enabling high-quality, detailed image generation from text prompts. It is trained on extensive datasets and is designed for both creative and scientific applications.
Stable Diffusion XL (SDXL) is a powerful text-to-image generation model from Stability AI, featuring a 3x larger UNet, dual text encoders (OpenCLIP ViT-bigG/14 and the original), and a two-stage process for generating highly detailed, controllable images. It introduces size and crop-conditioning for greater control and quality in image generation.
GPT-4.1, a flagship model for advanced instruction following, software engineering, and long-context reasoning. Supports a 1 million token context window and is tuned for precise code diffs, agent reliability, and high recall in large document contexts.
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model from Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding.
Claude 3.5 Haiku is Anthropic’s fastest model, featuring enhancements across coding, tool use, and reasoning. It is optimized for high interactivity and low latency, making it ideal for user-facing chatbots, on-the-fly code completions, data extraction, and real-time content moderation. The model does not support image inputs.
The continually updated version of OpenAI ChatGPT 4o, always pointing to the current GPT-4o model used by ChatGPT. Incorporates additional RLHF and may differ from the API version. Intended for research and evaluation, not recommended for production as it may be redirected or removed in the future.
A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.
The November 2024 release of GPT-4o, featuring enhanced creative writing, more natural and engaging responses, and improved file handling. Maintains the intelligence of GPT-4 Turbo while being twice as fast and 50% more cost-effective, with better support for non-English languages and visual tasks.
OpenAI’s most advanced small model, GPT-4o mini, supports both text and image inputs with text outputs. It is highly cost-effective, achieving SOTA intelligence and outperforming larger models on key benchmarks, making it ideal for scalable, interactive applications.
Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.