jyoung105/moondream
Answer questions about images from a text prompt, returning a text response. Accepts an input image and a prompt and out...
Found 150 models (showing 81-100)
Answer questions about images from a text prompt, returning a text response. Accepts an input image and a prompt and out...
Parse GUI screenshots into structured UI elements with bounding boxes and captions. Accepts an image of a desktop or mob...
Search the web for matches to an input image using Google Lens and return structured search results. Accept an image inp...
Answer questions about images from a text prompt and return a text response. Accept a single image plus a natural-langua...
Extract text and structured data from images and multi-page PDFs using visual OCR and layout analysis. Accept an image o...
Tag images with multiple keywords. Takes a single image as input and outputs a list of textual tags describing objects,...
Caption images. Accepts an image and outputs a short natural-language description using visual attention to focus on sal...
Answer questions about images from an image and a text prompt, returning text. Generate captions, short answers, and exp...
Caption images with grounded object localization. Take an image as input and return a brief or detailed natural-language...
Analyze images to generate captions, extract OCR text, detect objects, and produce segmentation masks and region proposa...
Convert images or PDFs containing mathematical notation into Markdown/LaTeX text. Accept an image input and return a tex...
Caption images and perform visual question answering from an image and a text prompt, returning a text response. Choose...
Caption images and answer visual questions from a text prompt and optional image, returning text. Support long-context i...
Tag and segment objects in images, returning labels, bounding boxes, and pixel masks. Accepts an image as input and outp...
Answer questions about images, documents, charts, and tables. Takes an image and a text prompt and returns text. Support...
Answer questions about images and documents. Accepts 1–3 images plus an instruction and returns text for tasks like visu...
Answer questions about images and text with multimodal reasoning. Takes a text prompt with an optional image and outputs...
Caption images and answer visual questions from an input image and text prompt. Accepts an image and a prompt; outputs t...
Caption images. Takes a single image as input and returns a concise natural-language description of the scene, objects,...
Classify 37 dog and cat breeds from an input image, returning the predicted breed label. Uses a fine-tuned ResNet18 for...