Models

Explore AI models available through NagaAI.

15 models

Sort by

15 models

Step 3.5 Flash (free)

417M Tokens

Step 3.5 Flash is our most capable open-source foundation model, designed to deliver frontier-level reasoning and agentic performance with standout efficiency. It uses a sparse Mixture of Experts (MoE) architecture that activates only 11B of its 196B parameters per token, concentrating “intelligence density” to approach top proprietary models while staying fast enough for real-time interaction. Built for rapid, deep reasoning, it’s powered by 3-way Multi-Token Prediction (MTP-3), enabling typical generation speeds of 100–300 tok/s (and up to ~350 tok/s in single-stream coding). For coding and long-horizon agent work, it integrates a scalable RL training framework that supports stable autonomous execution, reaching 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0. For long-context workloads, Step 3.5 Flash offers a cost-efficient 256K context window via a hybrid attention design with a 3:1 Sliding Window Attention ratio (three SWA layers per one full-attention layer), helping maintain performance on large codebases and massive documents while reducing the compute burden typical of long-context models.

StepFun

Free

Trinity Large Preview (free)

168M Tokens

Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing. It delivers exceptional performance in creative writing, storytelling, role-play, and conversational scenarios—surpassing the capabilities of typical reasoning models, particularly in real-time voice assistance. Furthermore, this model introduces advanced agentic capabilities; it is specifically optimized to navigate agent frameworks like OpenCode, Cline, and Kilo Code, while seamlessly managing intricate toolchains and extensive, constraint-heavy prompts. The architecture natively supports massive context windows of up to 512k tokens, though the current Preview API is served at a 128k context using 8-bit quantization for efficient deployment. Trinity-Large-Preview is a testament to Arcee’s efficiency-first design, providing a production-grade frontier model with open weights and permissive licensing tailored for both real-world deployment and rigorous experimentation.

ArceeAI

Free

GPT-5 Mini (Free)

1.41B Tokens

A compact variant of GPT-5, designed for efficient handling of lighter-weight reasoning and conversational tasks. GPT-5 Mini retains the instruction-following and safety features of its larger counterpart, but with reduced latency and cost. It is the direct successor to OpenAI’s o4-mini model, making it ideal for scalable, cost-sensitive deployments.

OpenAI

Free

Gemini 2.5 Flash (Free)

16.5B Tokens

Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.

Google

Free

Eleven Multilingual v2 (Free)

2.31M Tokens

Eleven-Multilingual-v2 is ElevenLabs’ most advanced multilingual text-to-speech model, delivering high-quality voice synthesis across a wide range of languages with improved realism and expressiveness. It is optimized for both accuracy and naturalness in multilingual scenarios.

ElevenLabs

Free

DALL-E 3 (Free)

17.4M Tokens

DALL-E 3 is OpenAI’s third-generation text-to-image model, offering enhanced detail, accuracy, and the ability to understand complex prompts. It excels at generating realistic and creative images, handling intricate details like text and human anatomy, and supports various aspect ratios for flexible output.

OpenAI

Free

GPT-4o Mini TTS (Free)

751K Tokens

A text-to-speech model built on GPT-4o mini, a fast and powerful language model. Use it to convert text into natural-sounding spoken audio.

OpenAI

Free

Flux 1 Schnell (Free)

20.4M Tokens

Flux-1-Schnell is a high-speed, open-source text-to-image model from Black Forest Labs, optimized for rapid, high-quality image generation in just a few steps. It is ideal for applications where speed and efficiency are critical.

Black Forest Labs

Free

Llama 3.3 70B Instruct (Free)

212M Tokens

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction-tuned generative model with 70B parameters. Optimized for multilingual dialogue, it outperforms many open-source and closed chat models on industry benchmarks. Supported languages include English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Meta Llama

Free

Whisper Large v3 (Free)

17.6M Tokens

Whisper Large v3 is OpenAI’s state-of-the-art model for automatic speech recognition (ASR) and speech translation. Trained on over 5 million hours of labeled data, it demonstrates strong generalization across datasets and domains, excelling in zero-shot transcription and translation tasks.

OpenAI

Free

Sonar (Free)

333M Tokens

Sonar is Perplexity’s lightweight, affordable, and fast question-answering model, now featuring citations and customizable sources. It is designed for companies seeking to integrate rapid, citation-enabled Q&A features optimized for speed and simplicity.

Perplexity

Free

Kandinsky 3.1 (Free)

3.09M Tokens

Kandinsky-3.1 is a large text-to-image diffusion model developed by Sber and AIRI, featuring 11.9 billion parameters. The model consists of a text encoder, U-Net, and decoder, enabling high-quality, detailed image generation from text prompts. It is trained on extensive datasets and is designed for both creative and scientific applications.

SberBank

Free

SDXL (Free)

1.25M Tokens

Stable Diffusion XL (SDXL) is a powerful text-to-image generation model from Stability AI, featuring a 3x larger UNet, dual text encoders (OpenCLIP ViT-bigG/14 and the original), and a two-stage process for generating highly detailed, controllable images. It introduces size and crop-conditioning for greater control and quality in image generation.

StabilityAI

Free

Llama 4 Scout 17B 16E Instruct (Free)

116M Tokens

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model from Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, it is instruction-tuned for multilingual chat, captioning, and image understanding.

Meta Llama

Free

GPT-4.1 Mini (Free)

253M Tokens

A mid-sized GPT-4.1 model delivering performance competitive with GPT-4o at substantially lower latency and cost. Retains a 1 million token context window and demonstrates strong coding ability and vision understanding, making it suitable for interactive applications with tight performance constraints.

OpenAI

Free

Supported Parameters

Models

Step 3.5 Flash (free)

Trinity Large Preview (free)

GPT-5 Mini (Free)

Gemini 2.5 Flash (Free)

Eleven Multilingual v2 (Free)

DALL-E 3 (Free)

GPT-4o Mini TTS (Free)

Flux 1 Schnell (Free)

Llama 3.3 70B Instruct (Free)

Whisper Large v3 (Free)

Sonar (Free)

Kandinsky 3.1 (Free)

SDXL (Free)

Llama 4 Scout 17B 16E Instruct (Free)

GPT-4.1 Mini (Free)

Input Modalities

Output Modalities

Supported Parameters

Startups

Step 3.5 Flash (free)

Trinity Large Preview (free)

GPT-5 Mini (Free)

Gemini 2.5 Flash (Free)

Eleven Multilingual v2 (Free)

DALL-E 3 (Free)

GPT-4o Mini TTS (Free)

Flux 1 Schnell (Free)

Llama 3.3 70B Instruct (Free)

Whisper Large v3 (Free)

Sonar (Free)

Kandinsky 3.1 (Free)

SDXL (Free)

Llama 4 Scout 17B 16E Instruct (Free)

GPT-4.1 Mini (Free)