spuuntries/urna-kp3l
Caption images and answer visual questions from an image and a text prompt. Accepts an input image and an instruction (e...
Found 149 models (showing 1-20)
Caption images and answer visual questions from an image and a text prompt. Accepts an input image and an instruction (e...
Generate images from text, edit existing images with natural-language instructions, and answer questions about images. T...
Segment objects in images from natural-language instructions and answer visual questions. Provide an image plus a text i...
Segment objects in images from natural-language instructions and answer grounded visual questions. Takes an image and a...
Segment objects and regions in images using natural language instructions. Accepts an image and a text instruction and r...
Generate captions, answers, and summaries for images and videos. Accepts an image or video plus a text prompt and output...
Answer questions about images, caption scenes, and localize entities with bounding boxes. Accept a ChatML-formatted prom...
Extract content and answer questions from images of documents. Takes an image plus a text prompt or question and outputs...
Caption images and answer visual questions from an input image. Optionally evaluate imageβtext matching. Provide an imag...
Answer questions about images and generate captions from a single input image. Provide an image and a natural-language q...
Answer questions about images. Provide an image and a natural-language question to receive a text answer, or switch to c...
Answer questions about images and extract text from images. Takes an image and a text prompt and returns a text response...
Generate booru-style tags from an input image. Extracts multi-label, Danbooru-style keywords covering subjects, attribut...
Answer questions about images. Takes an image and a text prompt, and returns a text answer, enabling visual question ans...
Generate and structure text from prompts or multi-turn chat messages, with optional image inputs for basic visual unders...
Generate text from prompts or chat messages, with optional image analysis for multimodal reasoning. Handle instruction f...
Answer questions about images and generate image-grounded text from an image and a text prompt. Perform visual question...
Answer questions about images and generate captions from an input image and a text prompt, returning text. Handle genera...
Analyze images and answer questions from an image plus text prompt, returning text. Handle visual question answering (VQ...
Answer questions about images and videos, perform OCR, and describe scenes, returning text. Accepts an image or a video...