spuuntries/urna-kp3l
Caption images and answer visual questions from an image and a text prompt. Accepts an input image and an instruction (e...
Found 166 models (showing 1-20)
Caption images and answer visual questions from an image and a text prompt. Accepts an input image and an instruction (e...
Generate images from text, edit existing images with natural-language instructions, and answer questions about images. T...
Segment objects in images from natural-language instructions and answer visual questions. Provide an image plus a text i...
Segment objects in images from natural-language instructions and answer grounded visual questions. Takes an image and a...
Segment objects and regions in images using natural language instructions. Accepts an image and a text instruction and r...
Generate captions, answers, and summaries for images and videos. Accepts an image or video plus a text prompt and output...
Answer questions about images, caption scenes, and localize entities with bounding boxes. Accept a ChatML-formatted prom...
Extract content and answer questions from images of documents. Takes an image plus a text prompt or question and outputs...
Caption images and answer visual questions from an input image. Optionally evaluate imageβtext matching. Provide an imag...
Answer questions about images and generate captions from a single input image. Provide an image and a natural-language q...
Answer questions about images. Provide an image and a natural-language question to receive a text answer, or switch to c...
Answer questions about images and extract text from images. Takes an image and a text prompt and returns a text response...
Generate booru-style tags from an input image. Extracts multi-label, Danbooru-style keywords covering subjects, attribut...
Answer questions about images. Takes an image and a text prompt, and returns a text answer, enabling visual question ans...
Generates text responses from prompts with ultra-low latency and fast response times. Supports up to 1 million token con...
Generates text responses based on prompts or conversation messages, with support for image input analysis. This is the f...
Answer questions about images and generate image-grounded text from an image and a text prompt. Perform visual question...
Answer questions about images and generate captions from an input image and a text prompt, returning text. Handle genera...
Analyzes images and answers questions about visual content using a Mixture-of-Experts architecture. Takes an image and t...
Answer questions about images and videos, perform OCR, and describe scenes, returning text. Accepts an image or a video...