zsxkib/voxtral ❓🖼️📝🔢 → 📝

▶️ 76 runs 📅 Jul 2025 ⚙️ Cog 0.16.0 🔗 GitHub 📄 Paper ⚖️ License

audio-analysis audio-understanding speech-to-text

Performance

2.3sTypical run time

~264sCold start (first call)

76Total runs

About

Voxtral Mini (3B) + Small (24B)🎙️ Speech transcription and audio understanding in 8 languages🧠

Example Output

Prompt:

"What can you tell me about this audio?"

Output

من يتجه لحسم المعركة وما هي أهداف كل من تركيا وقصيد ومن وراءها الولايات المتحدة الأمريكية في هذه المواجهات؟

Performance Metrics

2.27s Prediction Time

263.56s Total Time

All Input Parameters

{
  "mode": "transcription",
  "audio": "https://replicate.delivery/pbxt/NPqAcUKAPImFd6Sva7qzqwdl4UCvCsSlUKJTVPAaovXzJeIQ/arabic_news_report.mp3",
  "prompt": "What can you tell me about this audio?",
  "language": "Auto-detect",
  "max_tokens": 500,
  "model_size": "mini"
}

Input Parameters

mode Default: transcription: Choose processing mode: 'transcription' converts speech to text, 'understanding' analyzes audio content using prompts.
audio (required) Type: string: Audio file to process.
prompt Type: stringDefault: What can you tell me about this audio?: Question or instruction for understanding mode (e.g., 'What is the speaker discussing?', 'Summarize this audio'). Ignored in transcription mode.
language Default: Auto-detect: Audio language. 'Auto-detect' works for most content, or choose a specific language for better accuracy.
max_tokens Type: integerDefault: 500Range: 50 - 1000: Maximum response length. Higher values allow longer outputs but increase processing time.
model_size Default: mini: Model selection: 'mini' (3B) is faster and uses less GPU memory, 'small' (24B) provides higher accuracy for complex audio.

Output Schema

Output

Type: string

Example Execution Logs

Using Voxtral Mini (3B) model
Mode: transcription
Auto-detecting language (using English as fallback)
Processing audio for transcription...
Generating transcription...
Transcription completed: 107 characters

Version Details

Version ID: f5d491cbd58d6b048de5da796a4c6267621147b261cc72f02ebee4f39a94d5c5
Version Created: July 24, 2025

Run on Replicate →