lucataco/qwen-vl-chat
Answer questions about images. Accept an image and a text prompt and return text outputs for visual question answering,...
Found 73 models (showing 1-20)
Answer questions about images. Accept an image and a text prompt and return text outputs for visual question answering,...
Answer questions about images and extract text from images. Takes an image and a text prompt and returns a text response...
Analyze images to identify unusual or noteworthy elements based on textual prompts. This model processes an input image...
Caption images and answer visual questions from an input image and text query, returning a text response. Handle general...
Analyze images and answer questions about them in natural language. Accepts a text prompt and an optional image and retu...
Analyze images and generate text responses to prompts. Accepts an image and a text prompt, and outputs text for visual q...
Analyze images and answer questions from an input image and text instruction, returning text. Support visual question an...
Generate and reason over text from prompts or multi-turn chat, with optional image inputs for vision understanding and i...
Generate and reason over text and code from a prompt, with optional image input for captioning and visual analysis, and...
Classify the safety of multimodal inputs (image and user message) for content moderation. Accepts an image (required) an...
Predicts the age of a person in an input image using CLIP by computing the similarity between age-related prompts and th...
Predicts age from an input image using CLIP model.
Generate text from prompts or chat messages, with optional image analysis for multimodal reasoning. Handle instruction f...
Generate text from prompts or chat and analyze images to produce captions and grounded answers. Accepts text and optiona...
Solve complex reasoning tasks and generate text responses from prompts, multi-turn chat messages, and images. Accept a s...
Generate and reason over text with optional image inputs, returning text outputs. Handle long-context tasks with a 200k-...
Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...
Generate and chat in natural language from text prompts, with optional image inputs for visual understanding and image-t...
Generate text and code from a prompt, with optional image analysis for captions and visual reasoning. Accepts a text pro...
Generate and reason over text from prompts or chat messages, with optional image inputs for multimodal understanding; ou...