sai88uk/minicpm-v-45-v9
Answer questions about images and videos, perform OCR, and describe scenes, returning text. Accepts an image or a video...
Found 90 models (showing 61-80)
Answer questions about images and videos, perform OCR, and describe scenes, returning text. Accepts an image or a video...
Generates text responses with built-in reasoning capabilities for complex problem-solving and expert-level analysis. Sup...
Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...
Generates text responses from text prompts and optional image inputs. Supports multimodal capabilities for analyzing and...
Caption images and answer visual questions from an input image. Provide an image and either generate a caption or ask a...
Answer questions about images with step-by-step reasoning. Take an image and an optional text prompt and output text, in...
Answer questions about images from an image and text prompt, returning text responses. Perform visual question answering...
Generate text content with structured outputs, web search capabilities, and custom tools based on text prompts and image...
Generates detailed textual descriptions of images based on input prompts. Utilizes a vision-language model to analyze an...
Answer questions about images and generate captions from an image and a text prompt, returning text. Perform visual ques...
Generates descriptive text captions from three input images using arithmetic operations on image features. The model com...
Generate image captions from a single image input. Select from COCO or Conceptual Captions modes and optionally use beam...
Caption images. Accepts an image input and generates a zero-shot natural-language description, optionally conditioned by...
Generates text responses from prompts using OpenAI's GPT-4o mini model with low latency and cost optimization. Supports...
Generate text responses for complex tasks with 1 million token context window and multimodal capabilities. Features impr...
Answer questions about images in Korean. Take an image and a Korean prompt and generate Korean text for visual question...
Generates text based on text prompts and optional image inputs. Handles multimodal tasks combining text and image analys...
Analyze images or video and generate text captions, answers, and summaries. Accepts single or multiple images or a video...
Caption images and answer visual questions from an input image, returning text. Accept an image and an optional instruct...
Caption images and videos and answer visual questions. Accepts an optional image or video plus a text prompt and returns...