zsxkib/voxtral โ“๐Ÿ–ผ๏ธ๐Ÿ“๐Ÿ”ข โ†’ ๐Ÿ“

โ–ถ๏ธ 41 runs ๐Ÿ“… Jul 2025 โš™๏ธ Cog 0.16.0 ๐Ÿ”— GitHub ๐Ÿ“„ Paper โš–๏ธ License
audio-understanding speech-to-text

About

Voxtral Mini (3B) + Small (24B)๐ŸŽ™๏ธ Speech transcription and audio understanding in 8 languages๐Ÿง 

Example Output

Prompt:

"What can you tell me about this audio?"

Output

ู…ู† ูŠุชุฌู‡ ู„ุญุณู… ุงู„ู…ุนุฑูƒุฉ ูˆู…ุง ู‡ูŠ ุฃู‡ุฏุงู ูƒู„ ู…ู† ุชุฑูƒูŠุง ูˆู‚ุตูŠุฏ ูˆู…ู† ูˆุฑุงุกู‡ุง ุงู„ูˆู„ุงูŠุงุช ุงู„ู…ุชุญุฏุฉ ุงู„ุฃู…ุฑูŠูƒูŠุฉ ููŠ ู‡ุฐู‡ ุงู„ู…ูˆุงุฌู‡ุงุชุŸ

Performance Metrics

2.27s Prediction Time
263.56s Total Time
All Input Parameters
{
  "mode": "transcription",
  "audio": "https://replicate.delivery/pbxt/NPqAcUKAPImFd6Sva7qzqwdl4UCvCsSlUKJTVPAaovXzJeIQ/arabic_news_report.mp3",
  "prompt": "What can you tell me about this audio?",
  "language": "Auto-detect",
  "max_tokens": 500,
  "model_size": "mini"
}
Input Parameters
mode Default: transcription
Choose processing mode: 'transcription' converts speech to text, 'understanding' analyzes audio content using prompts.
audio (required) Type: string
Audio file to process.
prompt Type: stringDefault: What can you tell me about this audio?
Question or instruction for understanding mode (e.g., 'What is the speaker discussing?', 'Summarize this audio'). Ignored in transcription mode.
language Default: Auto-detect
Audio language. 'Auto-detect' works for most content, or choose a specific language for better accuracy.
max_tokens Type: integerDefault: 500Range: 50 - 1000
Maximum response length. Higher values allow longer outputs but increase processing time.
model_size Default: mini
Model selection: 'mini' (3B) is faster and uses less GPU memory, 'small' (24B) provides higher accuracy for complex audio.
Output Schema

Output

Type: string

Example Execution Logs
Using Voxtral Mini (3B) model
Mode: transcription
Auto-detecting language (using English as fallback)
Processing audio for transcription...
Generating transcription...
Transcription completed: 107 characters
Version Details
Version ID
f5d491cbd58d6b048de5da796a4c6267621147b261cc72f02ebee4f39a94d5c5
Version Created
July 24, 2025
Run on Replicate โ†’