lucataco/qwen-vl-chat
Answer questions about images. Accept an image and a text prompt and return text outputs for visual question answering,...
Found 76 models (showing 1-20)
Answer questions about images. Accept an image and a text prompt and return text outputs for visual question answering,...
Answer questions about images and extract text from images. Takes an image and a text prompt and returns a text response...
Analyze images to identify unusual or noteworthy elements based on textual prompts. This model processes an input image...
Caption images and answer visual questions from an input image and text query, returning a text response. Handle general...
Analyze images and answer questions about them in natural language. Accepts a text prompt and an optional image and retu...
Analyze images and generate text responses to prompts. Accepts an image and a text prompt, and outputs text for visual q...
Analyze images and answer questions from an input image and text instruction, returning text. Support visual question an...
Generate and reason over text for coding, question answering, and multi-step problem solving. Accepts text prompts or ch...
Generate and reason over text and code from a prompt, with optional image input for captioning and visual analysis, and...
Classify the safety of multimodal inputs (image and user message) for content moderation. Accepts an image (required) an...
Predicts the age of a person in an input image using CLIP by computing the similarity between age-related prompts and th...
Predicts age from an input image using CLIP model.
Generate text and code from prompts and chat messages with fast, low-cost responses. Accept optional image inputs to cap...
Generate and reason over text and images for chat, coding, translation, and analysis. Accept a single text prompt or cha...
Generate text from prompts or chat messages, with optional image inputs for visual understanding and captioning, and ret...
Generate and reason over text with optional image inputs, returning text outputs. Handle long-context tasks with a 200k-...
Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...
Generate and chat in natural language from text prompts, with optional image inputs for visual understanding and image-t...
Generate text and code from a prompt, with optional image analysis for captions and visual reasoning. Accepts a text pro...
Generate and reason over text from prompts or chat messages, with optional image inputs for multimodal understanding; ou...