
lucataco/qwen-vl-chat
Answer questions about images. Provide an image and a natural-language question to receive a text response that handles...
Found 49 models (showing 1-20)
Answer questions about images. Provide an image and a natural-language question to receive a text response that handles...
Answer questions about images, perform OCR, and caption visual content. Takes an image and a text prompt and outputs tex...
Analyze images to identify unusual or noteworthy elements based on textual prompts. This model processes an input image...
Caption images and answer visual questions from an input image and a text query. Return text responses for VQA, image de...
Answer questions about images in a multi-turn chat from an image and text prompt, returning text. Perform visual questio...
Answer questions about images from an image and text prompt, returning text. Perform visual question answering (VQA), im...
Answer questions about images and caption them from an image plus text prompt, returning text. Perform visual recognitio...
Generate and reason about text from prompts or chat messages for coding, analysis, and instruction following. Accept ima...
Generate text and code from a prompt, with optional image input for captioning and visual analysis. Supports fast standa...
Moderate images and accompanying user messages by classifying safety risks. Takes an image and optional text input; outp...
Predicts the age of a person in an input image using CLIP by computing the similarity between age-related prompts and th...
Predicts age from an input image using CLIP model.
Generate text quickly for chat, question answering, code generation, classification, summarization, and translation. Acc...
Generate text for chat, coding, and reasoning from text prompts or multi-turn messages, with optional image input for an...
Solve complex reasoning tasks and generate text from prompts or chat messages. Accept text or messages and optional imag...
Generate and reason over text from prompts, with optional image analysis. Accepts text and an optional image, and return...
Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...
Generate text and analyze images with a fast, low-cost multimodal GPT-4o variant. Accept text prompts or chat message ar...
Generate text and code from prompts, with optional image analysis and visual question answering. Accepts a text prompt a...
Generate text from prompts or chat messages, with optional image inputs for multimodal reasoning and captioning. Accept...