jigsawstack/speech-to-text
Transcribe speech from audio or video into text. Outputs a full transcript with optional per-segment timestamps and spea...
Found 72 models (showing 21-40)
Transcribe speech from audio or video into text. Outputs a full transcript with optional per-segment timestamps and spea...
Transcribe speech to text with optional word-level timestamps. Accept audio files and HLS m3u8 streams, with start_time...
Transcribe long-form, multilingual speech to text from audio input. Accepts an audio file and optional language code, an...
Transcribe and diarize noisy multi-speaker audio. Accept audio files or base64 and output structured segments with text,...
Transcribe speech from audio to text. Handle multilingual transcription with automatic language detection and optional t...
Transcribe speech and analyze audio content with Q&A and summarization across multiple languages. Accepts an audio file...
Convert PDFs, Office files, images, audio, HTML, and structured data to Markdown for LLM ingestion, indexing, and analys...
Translate speech and text between languages, returning text or synthesized speech audio. Accepts audio or text input and...
Transcribe long-form audio from multiple chunks into timestamped text. Accepts an array of audio chunks with total durat...
Transcribe multilingual audio to text with time-aligned segments. Accepts an audio file and outputs segment- and word-le...
Add karaoke-style captions to a video. Input a video (optionally a transcript JSON) and get a captioned video plus an ed...
Transcribe and analyze audio content with Canary-Qwen-2.5B, a speech-to-text model that provides perfect transcription w...
Transcribe speech to text from audio with optional word-level timestamps and alignment. Handle long-form recordings usin...
Transcribe Belarusian speech to text from an audio file. Accepts spoken Belarusian (be) audio and returns a Cyrillic tra...
Transcribe two-speaker phone calls with timestamps and speaker labels. Accepts two audio tracks (operator and customer)...
Generate subtitles from audio or video input. Transcribe speech to text and return a JSON transcript with segment start/...
Transcribe English speech from an audio input and label speakers with diarization. Return structured JSON with timestamp...
Transcribe audio to text with speaker diarization and word-level timestamps. Takes an audio file as input and returns a...
Transcribe or translate speech from audio files and videos to text. Accept audio or video input and return a transcript...
Transcribe hours-long audio to text with WhisperX large-v3, generating segment timestamps and optional word-level alignm...