yorickvp/llava-v1.6-mistral-7b
Answer questions about images and generate captions from a text prompt and an optional image, returning text. Perform vi...
Found 82 models (showing 21-40)
Answer questions about images and generate captions from a text prompt and an optional image, returning text. Perform vi...
Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...
Generate captions for images using a simple GPT-5-mini wrapper. Input an image and receive a descriptive text output tha...
Generates text responses based on prompts and can analyze images. Excels at coding tasks with state-of-the-art performan...
Analyzes images and answers questions about them using a unified autoregressive framework for multimodal understanding....
Generate text based on text prompts and optional image inputs. This multimodal language model handles both text and imag...
Answer questions about images, caption scenes, and localize entities with bounding boxes. Accept a ChatML-formatted prom...
Analyzes images and answers questions about visual content using a Mixture-of-Experts architecture. Takes an image and t...
Caption images and answer visual questions from a text prompt and optional image, returning text. Support long-context i...
Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Takes an imag...
Analyzes images and generates text descriptions or responses to prompts about visual content. Processes diverse image ty...
Analyzes images and answers questions about them using a visual language model. Takes an image and a text query as input...
Analyzes images and answers questions about visual content with enhanced reasoning capabilities. Takes an image and text...
Analyzes images and answers questions about visual content through multimodal conversation. Designed as a foundation mod...
Caption images and answer visual questions from an input image and text prompt. Accept an image plus a question or instr...
Analyzes images and generates text descriptions or answers questions about visual content. Uses a projection module trai...
Answer questions about images and generate image-grounded text from an image and a text prompt. Perform visual question...
Generates text responses from text prompts and optional image inputs. Supports multimodal capabilities for analyzing and...
Answer questions about images and documents from an image and a text prompt, returning text. Handle visual question answ...
Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...