
openai/clip
Generate joint text and image embeddings for semantic search and cross‑modal retrieval. Accepts a single text string or...
Found 10 models (showing 1-10)
Generate joint text and image embeddings for semantic search and cross‑modal retrieval. Accepts a single text string or...
Embed text, images, and audio into a shared vector space for cross-modal retrieval and similarity search. Accepts a text...
Generate dense embeddings for document screenshots and text queries to power document and webpage retrieval. Encode scre...
Generate text and image embeddings. Produce 768-dimensional vectors from text or images using CLIP ViT-L/14 for semantic...
Compute CLIP ViT-L/14 embeddings from text and images for semantic search, cross-modal retrieval, and zero-shot classifi...
Compute 512-dimensional embeddings from images and/or text for similarity search, cross‑modal retrieval, clustering, ded...
Create multilingual text and image embeddings for cross-modal search, retrieval, and similarity. Accept text (up to 8192...
Embed images and text into a shared CLIP vector space for similarity search and zero-shot classification. Accepts lists...
Generate image embeddings from an input image for use with the Segment Anything Model (SAM) ViT-H. Accepts a single imag...
Compute CLIP embeddings for batches of text and images. Accepts newline-separated text strings and images (including bas...