yorickvp/llava-v1.6-mistral-7b
Answer visual questions and caption images from a text prompt and an optional image, returning text. Support multi-turn...
Found 67 models (showing 21-40)
Answer visual questions and caption images from a text prompt and an optional image, returning text. Support multi-turn...
Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...
Generate captions for images using a simple GPT-5-mini wrapper. Input an image and receive a descriptive text output tha...
Generate text and code from a prompt, with optional image analysis for captions and visual reasoning. Accepts a text pro...
Answer questions about images. Takes an image and a natural-language question and returns text, enabling visual question...
Generate and analyze text with optional image input. Accept a text prompt and an optional image and return text, support...
Answer questions about images, caption scenes, and localize entities with bounding boxes. Accept a ChatML-formatted prom...
Analyze images and answer questions from an image plus text prompt, returning text. Handle visual question answering (VQ...
Caption images and answer visual questions from a text prompt and optional image, returning text. Support long-context i...
Answer questions about images, documents, charts, and tables. Takes an image and a text prompt and returns text. Support...
Caption images and answer visual questions from an input image and text prompt. Accepts an image and a prompt; outputs t...
Caption images and answer visual questions from an input image and text query, returning a text response. Handle general...
Answer questions about images from a single image input and a text prompt, returning a single-turn text response. Perfor...
Answer questions about images. Accepts an image and an optional text prompt and returns a text response for visual quest...
Caption images and answer visual questions from an input image and text prompt. Accept an image plus a question or instr...
Answer questions and caption images from an input image and text prompt. Accept an image plus a natural-language query a...
Answer questions about images and generate image-grounded text from an image and a text prompt. Perform visual question...
Generate text from text and image inputs. Perform question answering, reasoning, document summarization, data analysis,...
Answer questions about images and documents from an image and a text prompt, returning text. Handle visual question answ...
Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...