A speech-to-text model using GPT-4o for transcribing audio. It offers improved word error rate, better language recognition, and higher accuracy compared to the original Whisper models. Use it for more precise transcripts.
Pricing
Pay-as-you-go rates for this model. More details can be found here.
Transcription (1 minute)
$0.0038
Capabilities
Input Modalities
Audio
Output Modalities
Text
Supported Parameters
Available parameters for API requests
Language