daanelson/imagebind
Generate shared embeddings for text, images, and audio for cross-modal retrieval and similarity search. Accepts a text s...
Found 6 models (showing 1-6)
Generate shared embeddings for text, images, and audio for cross-modal retrieval and similarity search. Accepts a text s...
Analyze music structure from an audio file. Return tempo (BPM), beats, downbeats, segment boundaries, and functional seg...
Analyze music to extract song structure, tempo (BPM), and downbeats, and optionally separate stems. Takes an audio file...
Identify and segment speakers in audio recordings. Takes an audio file as input and returns JSON with speaker-labeled ti...
Identify and segment speakers in an audio recording. Accepts an audio file and outputs JSON with time-stamped segments (...
Transcribe English speech from an audio input and label speakers with diarization. Return structured JSON with timestamp...