zsxkib/audio-flamingo-3
Analyze audio and answer questions about speech, music, and sound effects. Accepts an audio file and an optional text pr...
Found 72 models (showing 1-20)
Analyze audio and answer questions about speech, music, and sound effects. Accepts an audio file and an optional text pr...
Transcribe speech and generate spoken replies from an audio input. Accepts an audio file (with an optional text prompt)...
Transcribe, translate, summarize, and answer questions from audio input, returning text. Supports dedicated transcriptio...
Answer spoken queries with simultaneous text and speech output. Accepts a speech audio input and an optional instruction...
Transcribe audio to text with optional timestamps and LLM-powered analysis and question answering. Process long-form rec...
Chat and reason across text, images, audio, and video, outputting text and synthesized speech. Accept text prompts with...
Transcribe speech to text from an audio file. Return structured transcripts with sentence/phrase segments plus per-word...
Transcribe or translate speech to text from audio input. Run Whisper Large v3 with batched inference and Flash Attention...
Transcribe speech to text from an audio input using GPT-4o. Accepts an audio file and outputs a text transcript, with im...
Transcribe speech from audio into text. Perform multilingual automatic speech recognition with language detection and op...
Transcribe audio with speaker diarization. Takes an audio input and returns a text transcript with per-speaker labels, s...
Transcribe audio to text. Accepts an audio input and returns a text transcript using GPT-4o mini. Supports multilingual...
Transcribe audio to text with word-level timestamps and optional speaker diarization. Accepts an audio file with optiona...
Transcribe English speech to text from an input audio file. Uses an RNNT FastConformer ASR model co-developed by NVIDIA...
Transcribe speech from online videos into timestamped text. Accepts a video URL (YouTube and other supported sites) and...
Transcribe and structure spoken conversations from an audio input. Accept an audio file with optional session context (u...
Translate speech and text across 100+ languages, returning text and optionally synthesized speech. Accept audio or text...
Transcribe audio to text with fast, batched speech recognition. Accept an audio file as input and return a transcript wi...
Generate subtitles from an audio file. Transcribe speech to text and return time-aligned subtitles (SRT or VTT), the ful...
Transcribe and translate speech to text from audio input. Supports multilingual ASR in English, French, German, Spanish,...