openai/gpt-4o-mini-transcribe 📝🖼️🔢 → 📝

⭐ Official ▶️ 11.4K runs 📅 May 2025 ⚙️ Cog 0.16.8 ⚖️ License

multilingual speech-to-text transcription

About

A speech-to-text model that uses GPT-4o mini to transcribe audio

Example Output

Output

So we just added GPT-4o Mini Transcribe to Replicate, and thought you'd want to know. It's basically a speech-to-text model that uses GPT-4o Mini to turn your audio into text. The cool thing is that it's noticeably better than the Whisper models we've been using. Fewer errors, better at recognizing different languages, and just more accurate overall. If you've ever been frustrated with transcripts that mess up technical terms or struggle with different accents, you'll probably appreciate this upgrade. It just works better. Some quick tech specs if you're curious. It has a 16,000 token context window, which means it can handle longer audio clips in one go. And it can output up to 2,000 tokens, so you'll get nice complete transcripts. The model's knowledge is current up to June 2024.

Performance Metrics

4.09s Prediction Time

4.10s Total Time

All Input Parameters

{
  "language": "en",
  "audio_file": "https://replicate.delivery/xezq/ejt5KPWzFp25fUGtjPhwFmeeG5nFpCvu5zSMIySXnemTWn0lC/tmptuxz6n1z.mp3",
  "temperature": 0
}

Input Parameters

prompt Type: string: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
language Type: string: The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.
audio_file (required) Type: string: The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm
temperature Type: numberDefault: 0Range: 0 - 1: Sampling temperature between 0 and 1

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

Input audio duration: 52.2 seconds
Input token count: 870
Output token count: 167
Total token count: 1037
TTFT: 2.46s

Version Details

Version ID: 684265b6c4d23a4f5b3536a76e0b9e022ce5084f6da95fd7d0b5ebbc573a8261
Version Created: November 7, 2025

Run on Replicate →