daanelson/imagebind
Generate shared embeddings for text, images, and audio for cross-modal retrieval and similarity search. Accepts a text s...
Found 6 models (showing 1-6)
Generate shared embeddings for text, images, and audio for cross-modal retrieval and similarity search. Accepts a text s...
Analyze music structure from an audio file. Return tempo (BPM), beats, downbeats, segment boundaries, and functional seg...
Analyze music to extract song structure, tempo (BPM), and downbeats, and optionally separate stems. Takes an audio file...
Segment speakers in audio recordings. Take an audio file and return time-stamped speech segments labeled by speaker, the...
Identify and segment speakers in an audio recording. Accepts an audio file and outputs JSON with time-stamped segments (...
Transcribe English speech from an audio input and label speakers with diarization. Return structured JSON with timestamp...