Models

Explore a wide range of AI models available through the NagaAI platform.

whisper-large-v3-turbo

Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

byopenai

~$0.0001/minute

gemini-2.5-flash

Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.

bygoogle

$0.15/1M input tokens$1.25/1M output tokens

gemini-2.5-pro

Gemini 2.5 Pro is Google’s state-of-the-art AI model, designed for advanced reasoning, coding, mathematics, and scientific tasks. Employs “thinking” capabilities for nuanced context handling and achieves top-tier performance on multiple benchmarks, including first-place on the LMArena leaderboard.

bygoogle

$0.94/1M input tokens$6.25/1M output tokens

scribe-v1

Scribe-v1 is a cutting-edge speech recognition model from ElevenLabs, designed for accurate speech-to-text transcription in 99 languages. It excels at handling real-world audio and consistently outperforms models such as Gemini 2.0 Flash and Whisper Large V3, achieving notably low word error rates even in underserved languages.

byelevenlabs

~$0.0025/minute

whisper-large-v3

Whisper Large v3 is OpenAI’s state-of-the-art model for automatic speech recognition (ASR) and speech translation. Trained on over 5 million hours of labeled data, it demonstrates strong generalization across datasets and domains, excelling in zero-shot transcription and translation tasks.

byopenai

~$0.0008/minute

gpt-4o-transcribe

A speech-to-text model using GPT-4o for transcribing audio. It offers improved word error rate, better language recognition, and higher accuracy compared to the original Whisper models. Use it for more precise transcripts.

byopenai

~$0.0052/minute

gemini-1.5-flash-002

Gemini Flash 1.5 8B is optimized for speed and efficiency, delivering enhanced performance in small prompt tasks such as chat, transcription, and translation. Focuses on cost-effective solutions while maintaining high-quality results, making it suitable for real-time and large-scale operations.

bygoogle

$0.06/1M input tokens$0.22/1M output tokens

gemini-1.5-pro-002

Google’s latest multimodal model, supporting both image and video (where available) in text or chat prompts. Optimized for a wide range of language tasks, including code generation, text editing, problem solving, recommendations, and AI agent workflows.

bygoogle

$0.94/1M input tokens$3.75/1M output tokens

gemini-2.0-flash-lite-001

Gemini 2.0 Flash Lite is optimized for extremely fast response times and low cost, while maintaining the quality of larger models. Ideal for real-time and large-scale applications.

bygoogle

$0.04/1M input tokens$0.15/1M output tokens

gemini-2.0-flash-001

Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.

bygoogle

$0.05/1M input tokens$0.20/1M output tokens

Models

whisper-large-v3-turbo

gemini-2.5-flash

gemini-2.5-pro

scribe-v1

whisper-large-v3

gpt-4o-transcribe

gemini-1.5-flash-002

gemini-1.5-pro-002

gemini-2.0-flash-lite-001

gemini-2.0-flash-001

Models

whisper-large-v3-turbo

gemini-2.5-flash

gemini-2.5-pro

scribe-v1

whisper-large-v3

gpt-4o-transcribe

gemini-1.5-flash-002

gemini-1.5-pro-002

gemini-2.0-flash-lite-001

gemini-2.0-flash-001

Categories

Input Modalities

Output Modalities

Tiers