
lucataco/qwen-vl-chat
Answer questions about images. Provide an image and a natural-language question to receive a text response that handles...
Found 52 models (showing 1-20)
Answer questions about images. Provide an image and a natural-language question to receive a text response that handles...
Answer questions about images and extract text from images. Takes an image and a text prompt and returns text, enabling...
Analyze images to identify unusual or noteworthy elements based on textual prompts. This model processes an input image...
Caption images and answer visual questions from an input image and a text query. Return text responses for VQA, image de...
Answer questions about images in a multi-turn chat from an image and text prompt, returning text. Perform visual questio...
Answer questions about images from an image and text prompt, returning text. Perform visual question answering (VQA), im...
Answer questions about images and caption them from an image plus text prompt, returning text. Perform visual recognitio...
Generate and reason over text for chat, coding, and complex problem solving. Accept prompts or multi-turn messages and o...
Generate text from prompts with optional image analysis and captioning. Takes a text prompt (and optionally an image) an...
Moderate images and accompanying user messages by classifying safety risks. Takes an image and optional text input; outp...
Predicts the age of a person in an input image using CLIP by computing the similarity between age-related prompts and th...
Predicts age from an input image using CLIP model.
Generate text and code from prompts or chat messages with low latency and cost. Optionally analyze images to describe co...
Generate text for chat, coding, and reasoning from text prompts or multi-turn messages, with optional image input for an...
Solve complex reasoning tasks and generate text from prompts or chat messages. Accept text or messages and optional imag...
Generate and reason over text from prompts, with optional image analysis. Accepts text and an optional image, and return...
Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...
Generate text and analyze images with a fast, low-cost multimodal GPT-4o variant. Accept text prompts or chat message ar...
Generate text and code from prompts, with optional image analysis and visual question answering. Accepts a text prompt a...
Generate text from prompts or chat messages, with optional image inputs for multimodal reasoning and captioning. Accept...