mistralai/voxtral-mini-3b ❓🖼️📝🔢 → 📝

▶️ 36 runs 📅 Jul 2025 ⚙️ Cog 0.15.11 🔗 GitHub ⚖️ License
audio-understanding speech-to-text speech-translation

About

Voxtral builds upon Ministral-3B with powerful audio understanding capabilities

Example Output

Prompt:

"Please describe what you hear in this audio."

Output

This week, I traveled to Chicago to deliver my final farewell address to the nation, following in the tradition of presidents before me. It was an opportunity to say thank you. Whether we've seen eye-to-eye or rarely agreed at all, my conversations with you, the American people, in living rooms and schools, at farms and on factory floors, at diners and on distant military outposts, All these conversations are what have kept me honest, kept me inspired, and kept me going. Every day, I learned from you. You made me a better president, and you made me a better man. Over the course of these eight years, I've seen the goodness, the resilience, and the hope of the American people. I've seen neighbors looking out for each other as we rescued our economy from the worst crisis of our lifetimes. I've hugged cancer survivors who finally know the security of affordable health care. I've seen communities like Joplin rebuild from disaster, and cities like Boston show the world that no terrorist will ever break the American spirit. I've seen the hopeful faces of young graduates and our newest military officers. I've mourned with grieving families searching for answers, and I found grace in a Charleston church. I've seen our scientists help a paralyzed man regain his sense of touch, and our wounded warriors walk again. I've seen our doctors and volunteers rebuild after earthquakes and stop pandemics in their tracks. I've learned from students who are building robots and curing diseases and who will change the world in ways we can't even imagine. I've seen the youngest of children remind us of our obligations to care for our refugees, to work in peace, and above all, to look out for each other. That's what's possible when we come together in the slow, hard, sometimes frustrating, but always vital work of self-government. But we can't take our democracy for granted. All of us, regardless of party, should throw ourselves into the work of citizenship. Not just when there's an election. Not just when our own narrow interest is at stake. But over the full span of a lifetime. If you're tired of arguing with strangers on the Internet, try to talk with one in real life. If something needs fixing, lace up your shoes and do some organizing. If you're disappointed by your elected officials, then grab a clipboard, get some signatures, and run for office yourself. Our success depends on our participation, regardless of which way the pendulum of power swings. It falls on each of us to be guardians of our democracy, to embrace the joyous task we've been given to continually try to improve this great nation of ours. Because for all our outward differences, we all share the same proud title, citizen. It has been the honor of my life to serve you as president. Eight years later, I am even more optimistic about our country's promise, and I look forward to working along your side as a citizen for all my days that remain. Thanks, everybody. God bless you, and God bless the United States of America.

Performance Metrics

13.39s Prediction Time
13.40s Total Time
All Input Parameters
{
  "mode": "transcription",
  "audio": "https://replicate.delivery/pbxt/NNhCqgFluXfFWUWKLf9JUWjrHk82lNV1H89VJpfhrhLdiXE5/obama.mp3",
  "prompt": "Please describe what you hear in this audio.",
  "language": "en",
  "max_new_tokens": 1024
}
Input Parameters
mode Default: transcription
Processing mode
audio (required) Type: string
Audio file to process
prompt Type: stringDefault: Please describe what you hear in this audio.
Text prompt for audio understanding mode (ignored for transcription)
language Default: en
Language code for transcription
max_new_tokens Type: integerDefault: 1024Range: 1 - 32768
Maximum number of tokens to generate
Output Schema

Output

Type: string

Version Details
Version ID
f5a2a8bcd86d1eb11fc27e73e52be3446ba6902bfe4a5a74e92b079639949930
Version Created
July 18, 2025
Run on Replicate →