cjwbw/cogvlm
Caption images and answer visual questions from an input image and text query, returning a text response. Handle general...
Found 149 models (showing 21-40)
Caption images and answer visual questions from an input image and text query, returning a text response. Handle general...
Answer questions about images and return text. Accepts an image and a natural-language question and outputs a textual an...
Analyze images and videos to generate captions, answer visual questions, and summarize scenes. Accepts an image or a vid...
Caption images and answer questions about images. Takes an image and a text prompt as input and returns text, enabling i...
Answer questions and caption images from an input image and text prompt. Accept an image plus a natural-language query a...
Answer questions about images and generate text descriptions. Accepts an image and a natural-language prompt; returns te...
Answer questions about images from a single image input and a text prompt, returning a single-turn text response. Perfor...
Answer questions about images and generate captions from an input image and text prompt. Output free-form text grounded...
Answer questions about images from an image and text prompt, returning text. Perform visual question answering, image ca...
Answer questions about an image and generate captions, returning text based on visual content. Provide a single image an...
Extract text from images and PDFs in 90+ languages. Accept an image or multi-page PDF, a selected language list, and a p...
Extract LaTeX code from images of mathematical equations and expressions. Takes a single image as input and returns the...
Analyze documents and images from one or more image inputs plus a text prompt, returning text captions, OCR, and answers...
Generate text captions from images. Accepts a single image and returns a concise natural-language description of its con...
Auto-tag images with labels and confidence scores. Takes an image as input and returns a list of tags with associated co...
Answer questions about images. Takes an image and a natural-language question and returns text, enabling visual question...
Extract structured document layout and text from an image input and return a single JSON output. Parse page elements wit...
Answer questions about images and generate captions from an image and a text prompt, outputting text. Perform visual que...
Generate a Stable Diffusionβready text prompt from an input image. Analyze content and style using CLIP Interrogator and...
Answer questions about images and documents from an image and a text prompt, returning text. Handle visual question answ...