🤖 Model 🖼️
aliakbarghayoori/dfn5b-clip-vit-h-14-384
Embed images and text into a shared CLIP vector space for similarity search, cross-modal retrieval, and zero-shot classi...
Found 4 models (showing 1-4)
Embed images and text into a shared CLIP vector space for similarity search, cross-modal retrieval, and zero-shot classi...
Create 768-dimensional CLIP (ViT-L/14) embeddings from text or images. Embed both modalities into a shared vector space...
Create multilingual text and image embeddings for cross-modal retrieval and semantic search. Accepts text (up to 8192 to...
Compute CLIP embeddings for batches of text and images. Accept multiple newline-separated inputs and return one vector e...