🤖 Model 🔊

daanelson/imagebind
Embed text, images, and audio into a shared vector space for cross-modal retrieval and similarity search. Accepts a text...
Found 4 models (showing 1-4)
Embed text, images, and audio into a shared vector space for cross-modal retrieval and similarity search. Accepts a text...
Analyze music structure from an audio file. Predict tempo (BPM), beats, downbeats, functional segment boundaries, and se...
Analyze music to extract song structure, tempo (BPM), and downbeats from an audio file. Return a JSON timeline of sectio...
Segment speakers in an audio recording and return time-stamped labels and per-speaker embeddings. Takes an audio file as...