
cjwbw/cogvlm
Caption images and answer visual questions from an input image and a text query. Return text responses for VQA, image de...
Found 119 models (showing 21-40)
Caption images and answer visual questions from an input image and a text query. Return text responses for VQA, image de...
Answer questions about images and return text. Accepts an image and a natural-language question, and outputs textual res...
Generate captions and answer visual questions for images and videos from a text prompt. Accepts a single image or a vide...
Caption images and answer visual questions from an image and a text prompt. Takes a single image plus an instruction (e....
Caption images and answer visual questions from an image and a text prompt, returning text. Add visual understanding to...
Answer questions about images and generate captions from a text prompt. Accept a single image and a natural-language que...
Answer visual questions and caption images from an input image and text prompt, returning text. Perform single-turn visu...
Caption images and answer questions from an image and a text prompt, returning text. Handle visual question answering (V...
Answer questions about an image from a text prompt and return text. Perform visual question answering, image captioning,...
Answer questions about images. Accepts a single image and a text prompt, and outputs text that captions the image or res...
Extract text from images and PDFs with multilingual OCR. Run line-level text detection or full OCR on selected pages and...
Convert images of mathematical equations into LaTeX code. Accepts an image input and returns a LaTeX string, performing...
Extract structured information and answer questions from documents, charts, tables, diagrams, and general images. Accept...
Caption images. Takes a single image input and returns a concise natural-language description of the scene, suitable for...
Auto-tag images with keyword labels and confidence scores. Takes an image as input and returns a ranked list of general...
Answer questions about images. Takes an image and a natural-language question as input and returns a text answer. Suppor...
Extract document layout and text from an image into structured JSON. Accepts a scanned page or document image and return...
Answer questions about images and generate captions from an image and a text prompt. Accepts a single image plus a natur...
Generate Stable Diffusion-ready text prompts from an input image. Analyze visuals with CLIP Interrogator (CLIP + BLIP) t...
Answer questions about images and documents from an image and a text prompt, returning text. Handle visual question answ...