
zsxkib/audio-flamingo-3
Answer questions about audio with text output, performing step-by-step reasoning across speech, music, and sound effects...
Found 65 models (showing 1-20)
Answer questions about audio with text output, performing step-by-step reasoning across speech, music, and sound effects...
Transcribe audio and generate spoken or textual responses from an audio input. Accepts an audio clip and optional text p...
Transcribe, translate, and understand speech from audio input, returning text. Switch between transcription and understa...
Answer spoken questions with both text and synthesized speech. Accept a speech audio clip and an optional instruction pr...
Transcribe long-form audio to text with optional timestamps and prompt-driven analysis. Accept an audio file and return...
Chat and analyze across text, images, audio, and video, returning text responses and optional synthesized speech. Accept...
Transcribe spoken audio to text. Accepts an audio file and returns a structured transcript with segments plus per-word t...
Transcribe audio to text at very high speed, with optional translation and speaker diarization. Accepts an audio input a...
Transcribe speech to text from an audio input. Accept an audio file with optional language hint and prompt to guide styl...
Transcribe speech from audio to text. Run Whisper large-v3 for multilingual automatic speech recognition (ASR), optional...
Transcribe audio with speaker diarization. Accepts audio files or base64 input and returns a structured transcript with...
Transcribe speech audio to text. Uses GPT-4o mini for multilingual recognition with improved word error rate versus orig...
Transcribe speech audio to text with word-level timestamps and optional speaker diarization. Takes an audio file and out...
Transcribe English speech to text from an input audio file. Leverage an RNNT ASR with a FastConformer encoder for robust...
Transcribe speech from online videos into text. Accepts a video input from supported sites and returns a JSON transcript...
Transcribe multi-speaker audio into a structured dialogue with speaker labels and section headers, returning JSON. Accep...
Translate speech and text across languages, returning translated text and optionally synthesized speech audio. Supports...
Transcribe audio to text. Accepts an audio input and outputs either plain text or segmented transcripts with start/end t...
Generate SRT or VTT subtitles and a transcript from an audio input. Transcribe multilingual speech with Whisper and retu...
Transcribe and translate speech from audio into text. Accept audio and an optional prompt and return text transcripts or...