
jyoung105/moondream
Answer questions about images from an image and text prompt, returning text. Support visual question answering, image ca...
Found 126 models (showing 81-100)
Answer questions about images from an image and text prompt, returning text. Support visual question answering, image ca...
Extract structured UI elements from screenshots. Accepts an image of a desktop, mobile, or web interface and returns a s...
Find matching web pages and visually similar images from an input image by scraping Google Lens results. Takes a single...
Answer questions about images from an image and a text prompt, returning text. Perform visual question answering and gro...
Extract text and structured data from images and multi-page PDFs. Provide an image or PDF plus a prompt string for scene...
Tag images with multiple keywords. Takes a single image as input and outputs a list of textual tags describing objects,...
Caption images. Input a single image and generate a natural-language description using visual attention that focuses on...
Answer questions about images from a text prompt. Accepts an image and a natural-language prompt and returns generated t...
Generate grounded image captions from an input image, linking phrases to detected objects with bounding boxes. Accepts a...
Analyze images to caption content, detect objects, segment regions, and extract text (OCR). Accepts an image and an opti...
Extract LaTeX-formatted math from images or PDFs and return Markdown text. Takes an image (or PDF) containing equations...
Answer questions about images and documents and generate captions from an image plus a text prompt, returning text. Sele...
Caption images and answer visual questions from a text prompt and an optional image, returning text. Support long-contex...
Tag and segment objects in images, returning labels, bounding boxes, and pixel masks. Accepts an image as input and outp...
Answer questions about images and extract information, returning text. Accepts an image plus a text prompt and outputs t...
Answer questions and caption images from one to three input images, returning text. Handle visual question answering (VQ...
Answer questions about images and text with step-by-step reasoning. Accepts a text prompt and an optional image, and out...
Caption and answer questions about images. Accepts an image and a natural-language prompt, returning text descriptions o...
Caption images. Provide an image and get a concise natural-language description of its contents for alt text, content ta...
Classify 37 dog and cat breeds from an input image, returning the predicted breed label. Uses a fine-tuned ResNet18 for...