zsxkib/voxtral ❓🖼️📝🔢 → 📝

▶️ 76 runs 📅 Jul 2025 ⚙️ Cog 0.16.0 🔗 GitHub 📄 Paper ⚖️ License
audio-analysis audio-understanding speech-to-text

About

Voxtral Mini (3B) + Small (24B)🎙️ Speech transcription and audio understanding in 8 languages🧠

Example Output

Prompt:

"What can you tell me about this audio?"

Output

من يتجه لحسم المعركة وما هي أهداف كل من تركيا وقصيد ومن وراءها الولايات المتحدة الأمريكية في هذه المواجهات؟

Performance Metrics

2.27s Prediction Time
263.56s Total Time
All Input Parameters
{
  "mode": "transcription",
  "audio": "https://replicate.delivery/pbxt/NPqAcUKAPImFd6Sva7qzqwdl4UCvCsSlUKJTVPAaovXzJeIQ/arabic_news_report.mp3",
  "prompt": "What can you tell me about this audio?",
  "language": "Auto-detect",
  "max_tokens": 500,
  "model_size": "mini"
}
Input Parameters
mode Default: transcription
Choose processing mode: 'transcription' converts speech to text, 'understanding' analyzes audio content using prompts.
audio (required) Type: string
Audio file to process.
prompt Type: stringDefault: What can you tell me about this audio?
Question or instruction for understanding mode (e.g., 'What is the speaker discussing?', 'Summarize this audio'). Ignored in transcription mode.
language Default: Auto-detect
Audio language. 'Auto-detect' works for most content, or choose a specific language for better accuracy.
max_tokens Type: integerDefault: 500Range: 50 - 1000
Maximum response length. Higher values allow longer outputs but increase processing time.
model_size Default: mini
Model selection: 'mini' (3B) is faster and uses less GPU memory, 'small' (24B) provides higher accuracy for complex audio.
Output Schema

Output

Type: string

Example Execution Logs
Using Voxtral Mini (3B) model
Mode: transcription
Auto-detecting language (using English as fallback)
Processing audio for transcription...
Generating transcription...
Transcription completed: 107 characters
Version Details
Version ID
f5d491cbd58d6b048de5da796a4c6267621147b261cc72f02ebee4f39a94d5c5
Version Created
July 24, 2025
Run on Replicate →