speech-to-text AI Models - Page 2

jigsawstack/speech-to-text

Transcribe speech from audio or video into text. Outputs a full transcript with optional per-segment timestamps and spea...

speech-to-text • speaker-diarization • 6 runs

🤖 Model

dmtanner/parakeet-tdt-0.6b-v3

Transcribe speech to text with optional word-level timestamps. Accept audio files and HLS m3u8 streams, with start_time...

speech-to-text • 1.1K runs

🤖 Model

notdaniel/voxtral-small-24b-2507

Transcribe long-form, multilingual speech to text from audio input. Accepts an audio file and optional language code, an...

speech-to-text • 47 runs

🤖 Model

rafaelgalle/whisper-diarization-advanced

Transcribe and diarize noisy multi-speaker audio. Accept audio files or base64 and output structured segments with text,...

speech-to-text • speaker-diarization • 107.5K runs

🤖 Model

soykertje/whisper

Transcribe speech from audio to text. Handle multilingual transcription with automatic language detection and optional t...

speech-to-text • multilingual • subtitle-generation • 84.7K runs

🤖 Model 🔊

zsxkib/voxtral

Transcribe speech and analyze audio content with Q&A and summarization across multiple languages. Accepts an audio file...

🔊 • speech-to-text • audio-analysis • 41 runs

🤖 Model

cuuupid/markitdown

Convert PDFs, Office files, images, audio, HTML, and structured data to Markdown for LLM ingestion, indexing, and analys...

pdf-to-markdown • ocr • speech-to-text • 73.0K runs

🤖 Model 📝 → 🔊

lucataco/seamless_communication

Translate speech and text between languages, returning text or synthesized speech audio. Accepts audio or text input and...

📝 → 🔊 • speech-translation • speech-to-text • text-to-speech • 915 runs

🤖 Model

romanfurman6/whisperx-multi-chunk

Transcribe long-form audio from multiple chunks into timestamped text. Accepts an array of audio chunks with total durat...

speech-to-text • speaker-diarization • 10 runs

🤖 Model

isaacgv/vec2

Transcribe multilingual audio to text with time-aligned segments. Accepts an audio file and outputs segment- and word-le...

speech-to-text • language-detection • subtitle-generation • 106 runs

🤖 Model 🎥

fictions-ai/autocaption

Add karaoke-style captions to a video. Input a video (optionally a transcript JSON) and get a captioned video plus an ed...

🎥 • video-auto-captioning • speech-to-text • 59.8K runs

🤖 Model 🔊

zsxkib/canary-qwen-2.5b

Transcribe and analyze audio content with Canary-Qwen-2.5B, a speech-to-text model that provides perfect transcription w...

🔊 • speech-to-text • audio-analysis • transcription • 32 runs

🤖 Model

carnifexer/whisperx

Transcribe speech to text from audio with optional word-level timestamps and alignment. Handle long-form recordings usin...

speech-to-text • word-level-timestamps • 13.5K runs

🤖 Model

holywalley/stt_be_ctc

Transcribe Belarusian speech to text from an audio file. Accepts spoken Belarusian (be) audio and returns a Cyrillic tra...

speech-to-text • belarusian • 79 runs

🤖 Model

skripnik/call-transcriber

Transcribe two-speaker phone calls with timestamps and speaker labels. Accepts two audio tracks (operator and customer)...

speech-to-text • call-transcription • 15 runs

🤖 Model 🎥

razvandrl/subtitler

Generate subtitles from audio or video input. Transcribe speech to text and return a JSON transcript with segment start/...

🎥 • speech-to-text • video-auto-captioning • 2.3K runs

🤖 Model 🔊

meronym/speaker-transcription

Transcribe English speech from an audio input and label speakers with diarization. Return structured JSON with timestamp...

🔊 • speech-to-text • speaker-diarization • audio-embedding • 28.3K runs

🤖 Model

sparkdoaz/www

Transcribe audio to text with speaker diarization and word-level timestamps. Takes an audio file as input and returns a...

speech-to-text • speaker-diarization • 171 runs

🤖 Model 🎥

turian/insanely-fast-whisper-with-video

Transcribe or translate speech from audio files and videos to text. Accept audio or video input and return a transcript...

🎥 • speech-to-text • video-to-text • speaker-diarization • 8.6M runs

🤖 Model

victor-upmeet/whisperx-a40-large

Transcribe hours-long audio to text with WhisperX large-v3, generating segment timestamps and optional word-level alignm...

speech-to-text • speaker-diarization • 710.2K runs