zsxkib/audio-flamingo-3
Analyze audio and answer questions about speech, music, and sound effects. Accepts an audio file and an optional text pr...
Found 11 models (showing 1-11)
Analyze audio and answer questions about speech, music, and sound effects. Accepts an audio file and an optional text pr...
Extract lip-sync mouth cues from speech audio and return time-aligned JSON for animation. Accepts common audio formats (...
Separate speakers in audio into labeled time segments. Accepts an audio file and optional constraints on the number of s...
Transcribe audio to text with optional timestamps and LLM-powered analysis and question answering. Process long-form rec...
Generate music tags from audio files using state-of-the-art CNN-based models. Supports multiple model variants including...
Separate audio mixtures into individual tracks including bass, drums, vocals, and other components.
Transcribe and analyze audio content with Canary-Qwen-2.5B, a speech-to-text model that provides perfect transcription w...
Identify who spoke when in an audio file. Takes a single audio recording as input and returns a diarization JSON with sp...
Transcribe speech and analyze audio content with Q&A and summarization across multiple languages. Accepts an audio file...
Transcribe piano audio into MIDI format and generate a Synthesia-style video visualization from the transcription. Utili...
Recognize and predict human emotions from speech audio files. Utilizes machine learning and deep learning algorithms to...