Explore a wide range of AI models available through the NagaAI platform.
Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.
Gemini 2.5 Flash is Google’s high-performance workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. Includes built-in "thinking" capabilities and is configurable through a "max tokens for reasoning" parameter for fine-tuned performance.
Gemini 2.5 Pro is Google’s state-of-the-art AI model, designed for advanced reasoning, coding, mathematics, and scientific tasks. Employs “thinking” capabilities for nuanced context handling and achieves top-tier performance on multiple benchmarks, including first-place on the LMArena leaderboard.
Scribe-v1 is a cutting-edge speech recognition model from ElevenLabs, designed for accurate speech-to-text transcription in 99 languages. It excels at handling real-world audio and consistently outperforms models such as Gemini 2.0 Flash and Whisper Large V3, achieving notably low word error rates even in underserved languages.
Whisper Large v3 is OpenAI’s state-of-the-art model for automatic speech recognition (ASR) and speech translation. Trained on over 5 million hours of labeled data, it demonstrates strong generalization across datasets and domains, excelling in zero-shot transcription and translation tasks.
A speech-to-text model using GPT-4o for transcribing audio. It offers improved word error rate, better language recognition, and higher accuracy compared to the original Whisper models. Use it for more precise transcripts.
Gemini Flash 1.5 8B is optimized for speed and efficiency, delivering enhanced performance in small prompt tasks such as chat, transcription, and translation. Focuses on cost-effective solutions while maintaining high-quality results, making it suitable for real-time and large-scale operations.
Google’s latest multimodal model, supporting both image and video (where available) in text or chat prompts. Optimized for a wide range of language tasks, including code generation, text editing, problem solving, recommendations, and AI agent workflows.
Gemini 2.0 Flash Lite is optimized for extremely fast response times and low cost, while maintaining the quality of larger models. Ideal for real-time and large-scale applications.
Gemini Flash 2.0 offers significantly faster time to first token (TTFT) compared to previous versions, while maintaining quality on par with larger models. Introduces enhancements in multimodal understanding, coding, complex instruction following, and function calling for robust agentic experiences.