speech-to-text AI Models - Page 4

mattsegal/incredibly-fast-whisper-distil-large-v2

Transcribe speech from an audio file to text. Leverage OpenAI Whisper large-v3 with an implementation optimized for fast...

speech-to-text • whisper • 134 runs

🤖 Model

douwantech/faster-whisper

Transcribe audio to time-aligned SRT subtitles. Accepts an audio file as input and returns an SRT subtitle file with tim...

speech-to-text • subtitle-generation • 60 runs

🤖 Model

loginethu/whisper-a100

Transcribe and optionally translate multilingual audio to English text. Accepts an audio file and returns transcripts as...

speech-to-text • subtitle-generation • 53 runs

🤖 Model

daanelson/whisper-jax-hindi

Transcribe Hindi speech from audio into text. Takes an audio file as input and returns Hindi transcripts for tasks like...

speech-to-text • hindi • 84 runs

🤖 Model

daanelson/whisper-train-preprocessor

Prepare datasets for fine-tuning Whisper ASR models. Accepts either tarballs of audio files and matching text transcript...

dataset-preprocessing • whisper-fine-tuning • asr-dataset • 39 runs

🤖 Model 🔊

lucataco/voxtral-mini-3b

Transcribe and understand audio with Voxtral Mini 3B, an advanced model that builds upon Ministral-3B. It excels in spee...

🔊 • audio-transcription • audio-understanding • speech-to-text • 27 runs

🤖 Model

sabuhigr/sabuhi-model-v2

Transcribe speech to text from an audio input. Optionally translate to English, perform speaker diarization, and bias re...

speech-to-text • speaker-diarization • speech-translation • 33.0K runs

🤖 Model

sabuhigr/sabuhi-model

Transcribe multilingual audio with speaker diarization and channel separation. Accepts an audio file and outputs text tr...

speech-to-text • speaker-diarization • 25.5K runs

🤖 Model 📝 → 📝

microsoft/phi-4-multimodal-instruct

Generate text responses from text, image, and audio inputs. Perform image captioning and visual question answering, OCR,...

📝 → 📝 • text-generation • speech-to-text • image-captioning • 13.2K runs

🤖 Model 🎥

shreejalmaharjan-27/tiktok-short-captions

Auto-caption videos with TikTok-style on-screen subtitles. Transcribe speech using Whisper large-v3 with automatic langu...

🎥 • video-auto-captioning • 196.1K runs

🤖 Model 🎥

idan054/sarra-video-maker-v1

Add autogenerated, stylized subtitles to a video. Input a video (optional: background music and/or a word‑level transcri...

🎥 • video-auto-captioning • video-editing • speech-to-text • 2 runs

🤖 Model

subformer/meta-omnilingual-asr-1b

Transcribe speech to text across 1,693 languages. Accepts short audio clips and returns a text transcript, with automati...

speech-to-text • multilingual-asr • 6 runs

🤖 Model

subformer/meta-omnilingual-asr-7b

Transcribe speech to text from short audio clips in 1,693 languages. Accept audio input and optionally a specified langu...

speech-to-text • multilingual • language-detection • 2 runs

🤖 Model

twangodev/qwenasr

Converts audio to text and provides speech-audio alignment using QwenASR. Supports two modes: transcription to convert s...

speech-to-text • 1.6K runs

🤖 Model 📝 → 🔊

lucataco/interactiveomni-8b

Processes multiple inputs simultaneously including images, audio, text, and video to generate coherent text and speech r...

📝 → 🔊 • text-generation • image-to-text • video-to-text • 86 runs

🤖 Model

xai/grok-speech-to-text

Transcribes audio files to text with support for 25 languages and automatic language detection. Provides word-level time...

speech-to-text • speaker-diarization • 2.1K runs

🤖 Model 🔊

fabiwlf/cog-resemble-enhance

Enhances and optimizes audio files containing speech. Uses CFM (Continuous Flow Matching) with configurable solvers incl...

🔊 • audio-denoising • speech-to-text • audio-to-audio • 493 runs