
zsxkib/audio-flamingo-3
Answer questions about audio with text output, performing step-by-step reasoning across speech, music, and sound effects...
Found 38 models (showing 1-20)
Answer questions about audio with text output, performing step-by-step reasoning across speech, music, and sound effects...
Transcribe audio and generate spoken or textual responses from an audio input. Accepts an audio clip and optional text p...
Transcribe, translate, and understand speech from audio input, returning text. Switch between transcription and understa...
Answer spoken questions and return both text and spoken responses. Accepts input speech audio (with an optional instruct...
Transcribe long-form audio to text with optional timestamps and prompt-driven analysis. Accept an audio file and return...
Chat and analyze across text, images, audio, and video, returning text responses and optional synthesized speech. Accept...
Transcribe spoken audio to text. Accepts an audio file and returns a structured transcript with segments plus per-word t...
Transcribe or translate audio to text. Accepts an audio input and returns a transcript with chunk- or word-level timesta...
Transcribe audio to text using GPT-4o. Accepts an audio file with optional language hint and prompt for style or segment...
Transcribe speech from an audio file to text. Support multilingual speech recognition with automatic language detection...
Transcribe audio to text with speaker diarization. Accepts an audio file, direct URL, or base64 audio and returns JSON w...
Transcribe audio to text. Accepts an audio file with an optional language hint (ISO-639-1) and an optional prompt to gui...
Transcribe speech audio to text with word-level timestamps and optional speaker diarization. Takes an audio file and out...
Transcribe English speech from audio into text (ASR). Handle noisy backgrounds and silent segments, outputting lowercase...
Transcribe speech from online videos into text. Accepts a video input from supported sites and returns a JSON transcript...
Transcribe multi-speaker audio into a structured dialogue with speaker labels and section headers, returning JSON. Accep...
Translate and transcribe speech and text across many languages, returning text and optionally synthesized speech. Suppor...
Transcribe audio to text with fast, batched WhisperX ASR, including word-level timestamps and speaker diarization. Accep...
Generate SRT or VTT subtitles and a full transcript from an audio file. Use OpenAI Whisper with selectable models (tiny....
Transcribe and translate speech from audio into text. Accept audio and an optional prompt and return text transcripts or...