aodianyun/minicpm-v-26
Caption images and videos. Take an image or video plus an optional prompt and return text that describes the visual cont...
Found 150 models (showing 61-80)
Caption images and videos. Take an image or video plus an optional prompt and return text that describes the visual cont...
Generate captions, answers, and summaries from an input image or video. Accept an image or video plus an optional prompt...
Generate text descriptions for images and videos. Accepts a single image or video plus an optional instruction prompt, a...
Answer questions about images and generate captions. Accepts an image and an optional text prompt/question, and returns...
Generate and reason over text from prompts or multi-turn chat, with optional image inputs for vision understanding and i...
Generate and reason over text from prompts or chat messages, with optional image inputs for multimodal understanding; ou...
Generate and reason over text and code from a prompt, with optional image input for captioning and visual analysis, and...
Generate and analyze text and code from a prompt, with optional image input for visual understanding and data extraction...
Generate structured JSON or free-form text from prompts, with optional web search and tool use. Accepts text and images...
Caption images and answer natural-language questions about an input image. Provide an image and an optional instruction...
Answer questions about images and generate captions from an image and a text query, returning text. Accept a single imag...
Solve complex reasoning tasks and generate text responses from prompts, multi-turn chat messages, and images. Accept a s...
Generate and analyze text with optional image input. Accept a text prompt and an optional image and return text, support...
Generate text from text and image inputs. Perform question answering, reasoning, document summarization, data analysis,...
Generate text from prompts and optionally analyze images with captioning and visual question answering. Accepts text and...
Generate and analyze text with optional image inputs, returning text for tasks like captioning and visual question answe...
Caption images and answer visual questions from an input image and text prompt, returning text. Handle multilingual outp...
Answer questions about images and generate captions from an image input. Takes an image and a text prompt (e.g., βDescri...
Answer questions about images. Takes an image and a text prompt and returns a text response, enabling visual question an...
Caption images. Takes an input image and returns a short natural-language description as text, useful for alt text, acce...