cjwbw/cogvlm
Analyzes images and answers questions about them using a visual language model. Takes an image and a text query as input...
Found 166 models (showing 21-40)
Analyzes images and answers questions about them using a visual language model. Takes an image and a text query as input...
Answers questions about images through multimodal understanding. Takes an image and a text question as input and generat...
Analyze images and videos to generate captions, answer visual questions, and summarize scenes. Accepts an image or a vid...
Caption images and answer questions about images. Takes an image and a text prompt as input and returns text, enabling i...
Analyzes images and generates text descriptions or answers questions about visual content. Uses a projection module trai...
Answer questions about images and generate text descriptions. Accepts an image and a natural-language prompt; returns te...
Analyzes images and answers questions about visual content with enhanced reasoning capabilities. Takes an image and text...
Analyzes images and responds to text prompts about visual content. Takes an image and a text prompt as input, then gener...
Answer questions about images from an image and text prompt, returning text. Perform visual question answering, image ca...
Answers questions about images using natural language. Takes an image and text prompt as input and generates contextual...
Extract text from images and PDFs in 90+ languages. Accept an image or multi-page PDF, a selected language list, and a p...
Extract LaTeX code from images of mathematical equations and expressions. Takes a single image as input and returns the...
Analyze documents and images from one or more image inputs plus a text prompt, returning text captions, OCR, and answers...
Generate text captions from images. Accepts a single image and returns a concise natural-language description of its con...
Auto-tag images with labels and confidence scores. Takes an image as input and returns a list of tags with associated co...
Analyzes images and answers questions about them using a unified autoregressive framework for multimodal understanding....
Extract structured document layout and text from an image input and return a single JSON output. Parse page elements wit...
Answer questions about images and generate captions from an image and a text prompt, outputting text. Perform visual que...
Generate a Stable Diffusionβready text prompt from an input image. Analyze content and style using CLIP Interrogator and...
Answer questions about images and documents from an image and a text prompt, returning text. Handle visual question answ...