speech-to-text AI Models

zsxkib/audio-flamingo-3

Analyze audio and answer questions about speech, music, and sound effects. Accepts an audio file and an optional text pr...

🔊 • speech-to-text • music-understanding • audio-analysis • 3.1K runs

🤖 Model 📝 → 🔊

zsxkib/kimi-audio-7b-instruct

Transcribe speech and generate spoken replies from an audio input. Accepts an audio file (with an optional text prompt)...

📝 → 🔊 • speech-to-text • text-to-speech • audio-to-audio • 3.3K runs

🤖 Model 🔊

mistralai/voxtral-mini-3b

Transcribe, translate, summarize, and answer questions from audio input, returning text. Supports dedicated transcriptio...

🔊 • speech-to-text • audio-understanding • 37 runs

🤖 Model 📝 → 🔊

ictnlp/llama-omni

Answer spoken queries with simultaneous text and speech output. Accepts a speech audio input and an optional instruction...

📝 → 🔊 • text-to-speech • text-generation • voice-assistant • 60.1K runs

🤖 Model 🔊

nvidia/canary-qwen-2.5b

Converts audio into text transcriptions with timestamps and provides AI-powered analysis of the content. Achieves 5.63%...

🔊 • speech-to-text • audio-to-text • question-answering • 70.9K runs

🤖 Model 📝 → 🔊

lucataco/qwen2.5-omni-7b

Process text, images, audio, and video inputs to generate text and speech responses simultaneously. Features a novel Thi...

📝 → 🔊 • text-generation • image-to-text • video-to-text • 32.9K runs

🤖 Model

jacksoby/whisperx

Transcribe speech to text from an audio file. Return structured transcripts with sentence/phrase segments plus per-word...

speech-to-text • 612 runs

🤖 Model

vaibhavs10/incredibly-fast-whisper

Transcribe or translate speech to text from audio input. Run Whisper Large v3 with batched inference and Flash Attention...

speech-to-text • speaker-diarization • 18.5M runs

🤖 Model

openai/gpt-4o-transcribe

Converts audio files to text transcriptions using GPT-4o for improved accuracy over traditional Whisper models. Supports...

speech-to-text • 59.7K runs

🤖 Model

openai/whisper

Transcribe speech from audio into text. Perform multilingual automatic speech recognition with language detection and op...

speech-to-text • speech-translation • 137.1M runs

🤖 Model

thomasmol/whisper-diarization

Transcribe audio with speaker diarization. Takes an audio input and returns a text transcript with per-speaker labels, s...

speech-to-text • speaker-diarization • 3.4M runs

🤖 Model

openai/gpt-4o-mini-transcribe

Transcribes audio to text using GPT-4o mini, offering improved word error rate and better language recognition compared...

speech-to-text • 21.9K runs

🤖 Model

victor-upmeet/whisperx

Transcribe audio to text with word-level timestamps and optional speaker diarization. Accepts an audio file with optiona...

speech-to-text • speaker-diarization • 4.6M runs

🤖 Model

nvidia/parakeet-rnnt-1.1b

Transcribe English speech to text from an input audio file. Uses an RNNT FastConformer ASR model co-developed by NVIDIA...

speech-to-text • 18.9K runs

🤖 Model 🎥

adidoes/whisperx-video-transcribe

Transcribe speech from online videos into timestamped text. Accepts a video URL (YouTube and other supported sites) and...

🎥 • speech-to-text • video-to-text • 19.6K runs

🤖 Model

aihilums/sehatsanjha

Transcribe and structure spoken conversations from an audio input. Accept an audio file with optional session context (u...

speech-to-text • speaker-diarization • conversation-structuring • 37.7K runs

🤖 Model 📝 → 🔊

cjwbw/seamless_communication

Translate speech and text across 100+ languages, returning text and optionally synthesized speech. Accept audio or text...

📝 → 🔊 • speech-translation • speech-to-text • text-to-speech • 89.0K runs

🤖 Model

daanelson/whisperx

Transcribe audio to text with fast, batched speech recognition. Accept an audio file as input and return a transcript wi...

speech-to-text • 89.6K runs

🤖 Model

m1guelpf/whisper-subtitles

Generate subtitles from an audio file. Transcribe speech to text and return time-aligned subtitles (SRT or VTT), the ful...

speech-to-text • subtitle-generation • 73.8K runs

🤖 Model

ibm-granite/granite-speech-3.3-8b

Transcribe and translate speech to text from audio input. Supports multilingual ASR in English, French, German, Spanish,...

speech-to-text • multilingual • 12.9K runs