openai/gpt-4o-mini-transcribe 📝🖼️🔢 → 📝

⭐ Official ▶️ 1.7K runs 📅 May 2025 ⚙️ Cog 0.16.8 ⚖️ License
multilingual speech-to-text transcription

About

A speech-to-text model that uses GPT-4o mini to transcribe audio

Example Output

Output

So we just added GPT-4o Mini Transcribe to Replicate, and thought you'd want to know. It's basically a speech-to-text model that uses GPT-4o Mini to turn your audio into text. The cool thing is that it's noticeably better than the Whisper models we've been using. Fewer errors, better at recognizing different languages, and just more accurate overall. If you've ever been frustrated with transcripts that mess up technical terms or struggle with different accents, you'll probably appreciate this upgrade. It just works better. Some quick tech specs if you're curious. It has a 16,000 token context window, which means it can handle longer audio clips in one go. And it can output up to 2,000 tokens, so you'll get nice complete transcripts. The model's knowledge is current up to June 2024.

Performance Metrics

4.09s Prediction Time
4.10s Total Time
All Input Parameters
{
  "language": "en",
  "audio_file": "https://replicate.delivery/xezq/ejt5KPWzFp25fUGtjPhwFmeeG5nFpCvu5zSMIySXnemTWn0lC/tmptuxz6n1z.mp3",
  "temperature": 0
}
Input Parameters
prompt Type: string
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
language Type: string
The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.
audio_file (required) Type: string
The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm
temperature Type: numberDefault: 0Range: 0 - 1
Sampling temperature between 0 and 1
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
Input audio duration: 52.2 seconds
Input token count: 870
Output token count: 167
Total token count: 1037
TTFT: 2.46s
Version Details
Version ID
60b48ed4cd354f9482bf2d4d39df27a6da5191dd93b5f411592da2fdc7e72e2d
Version Created
October 13, 2025
Run on Replicate →