spuuntries/urna-kp3l
Caption images and answer visual questions from an image and a text prompt. Accepts an input image and an instruction (e...
Found 54 models (showing 1-20)
Caption images and answer visual questions from an image and a text prompt. Accepts an input image and an instruction (e...
Segment objects in images from natural-language instructions and answer visual questions. Provide an image plus a text i...
Segment objects in images from natural-language instructions and answer grounded visual questions. Takes an image and a...
Caption images and answer visual questions from an input image. Optionally evaluate imageβtext matching. Provide an imag...
Answer questions about images and generate captions from a single input image. Provide an image and a natural-language q...
Answer questions about images. Provide an image and a natural-language question to receive a text answer, or switch to c...
Answer questions about images and generate image-grounded text from an image and a text prompt. Perform visual question...
Answer questions about images and generate captions from an input image and a text prompt, returning text. Handle genera...
Analyzes images and answers questions about visual content using a Mixture-of-Experts architecture. Takes an image and t...
Analyzes images and answers questions about them using a visual language model. Takes an image and a text query as input...
Answers questions about images through multimodal understanding. Takes an image and a text question as input and generat...
Caption images and answer questions about images. Takes an image and a text prompt as input and returns text, enabling i...
Answer questions about images and generate text descriptions. Accepts an image and a natural-language prompt; returns te...
Analyzes images and answers questions about visual content with enhanced reasoning capabilities. Takes an image and text...
Analyzes images and responds to text prompts about visual content. Takes an image and a text prompt as input, then gener...
Answer questions about images from an image and text prompt, returning text. Perform visual question answering, image ca...
Answers questions about images using natural language. Takes an image and text prompt as input and generates contextual...
Analyzes images and answers questions about them using a unified autoregressive framework for multimodal understanding....
Answer questions about images and generate captions from an image and a text prompt, outputting text. Perform visual que...
Answer questions about images and documents from an image and a text prompt, returning text. Handle visual question answ...