Models
Explore AI models available through NagaAI.
Explore AI models available through NagaAI.
Step 3.5 Flash is our most capable open-source foundation model, designed to deliver frontier-level reasoning and agentic performance with standout efficiency. It uses a sparse Mixture of Experts (MoE) architecture that activates only 11B of its 196B parameters per token, concentrating “intelligence density” to approach top proprietary models while staying fast enough for real-time interaction. Built for rapid, deep reasoning, it’s powered by 3-way Multi-Token Prediction (MTP-3), enabling typical generation speeds of 100–300 tok/s (and up to ~350 tok/s in single-stream coding). For coding and long-horizon agent work, it integrates a scalable RL training framework that supports stable autonomous execution, reaching 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0. For long-context workloads, Step 3.5 Flash offers a cost-efficient 256K context window via a hybrid attention design with a 3:1 Sliding Window Attention ratio (three SWA layers per one full-attention layer), helping maintain performance on large codebases and massive documents while reducing the compute burden typical of long-context models.
Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing. It delivers exceptional performance in creative writing, storytelling, role-play, and conversational scenarios—surpassing the capabilities of typical reasoning models, particularly in real-time voice assistance. Furthermore, this model introduces advanced agentic capabilities; it is specifically optimized to navigate agent frameworks like OpenCode, Cline, and Kilo Code, while seamlessly managing intricate toolchains and extensive, constraint-heavy prompts. The architecture natively supports massive context windows of up to 512k tokens, though the current Preview API is served at a 128k context using 8-bit quantization for efficient deployment. Trinity-Large-Preview is a testament to Arcee’s efficiency-first design, providing a production-grade frontier model with open weights and permissive licensing tailored for both real-world deployment and rigorous experimentation.
A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.
Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.
Eleven-Multilingual-v2 is ElevenLabs’ most advanced multilingual text-to-speech model, delivering high-quality voice synthesis across a wide range of languages with improved realism and expressiveness. It is optimized for both accuracy and naturalness in multilingual scenarios.
DALL-E 3 is OpenAI’s third-generation text-to-image model, offering enhanced detail, accuracy, and the ability to understand complex prompts. It excels at generating realistic and creative images, handling intricate details like text and human anatomy, and supports various aspect ratios for flexible output.
A text-to-speech model built on GPT-4o mini, a fast and powerful language model. Use it to convert text into natural-sounding spoken audio.
Flux-1-Schnell is a high-speed, open-source text-to-image model from Black Forest Labs, optimized for rapid, high-quality image generation in just a few steps. It is ideal for applications where speed and efficiency are critical.
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction-tuned generative model with 70B parameters. Optimized for multilingual dialogue, it outperforms many open-source and closed chat models on industry benchmarks. Supported languages include English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Whisper Large v3 is OpenAI’s state-of-the-art model for automatic speech recognition (ASR) and speech translation. Trained on over 5 million hours of labeled data, it demonstrates strong generalization across datasets and domains, excelling in zero-shot transcription and translation tasks.
Sonar is Perplexity’s lightweight, affordable, and fast question-answering model, now featuring citations and customizable sources. It is designed for companies seeking to integrate rapid, citation-enabled Q&A features optimized for speed and simplicity.
Kandinsky-3.1 is a large text-to-image diffusion model developed by Sber and AIRI, featuring 11.9 billion parameters. The model consists of a text encoder, U-Net, and decoder, enabling high-quality, detailed image generation from text prompts. It is trained on extensive datasets and is designed for both creative and scientific applications.
Stable Diffusion XL (SDXL) is a powerful text-to-image generation model from Stability AI, featuring a 3x larger UNet, dual text encoders (OpenCLIP ViT-bigG/14 and the original), and a two-stage process for generating highly detailed, controllable images. It introduces size and crop-conditioning for greater control and quality in image generation.
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model from Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding.
A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.