visual-question-answering AI Models - Page 3

ibm-granite/granite-vision-3.2-2b

Extract content and answer questions from images of documents. Takes an image plus a text prompt or question and outputs...

🖼️ → 📝 • image-to-text • ocr • 118.0K runs

🤖 Model 🖼️ → 📝

yimi81/yi-vl-6b

Answer questions about images and generate captions from an image and a text query, returning text. Accept a single imag...

🖼️ → 📝 • image-to-text • visual-question-answering • image-captioning • 309 runs

🤖 Model 🖼️ → 📝

jyoung105/imp

Answer questions about images. Takes an image and a text prompt and returns a text response, enabling visual question an...

🖼️ → 📝 • image-to-text • visual-question-answering • 71 runs

🤖 Model 🖼️ → 📝

cuuupid/glm-4v-9b

Generates text responses to questions about images with multimodal understanding capabilities. Takes an image and text p...

🖼️ → 📝 • image-to-text • ocr • text-generation • 93.8K runs

🤖 Model 🖼️ → 📝

jyoung105/honeybee

Analyzes images and generates text responses based on both the image content and text prompts. Uses a locality-enhanced...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 30 runs

🤖 Model 🖼️ → 📝

jimothyjohn/phi3-vision-instruct

Analyze images and answer questions about them using text prompts. Built on Microsoft's Phi-3.5-Vision-Instruct, a light...

🖼️ → 📝 • image-to-text • visual-understanding • text-generation • 208 runs

🤖 Model 🖼️ → 📝

lucataco/moondream1

Answer questions about images. Takes an image and a text prompt, and returns a text answer, enabling visual question ans...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 11.5K runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl-7b-base

Analyzes images and generates text descriptions or responses to prompts about visual content. Processes diverse image ty...

🖼️ → 📝 • image-to-text • visual-understanding • 6.6K runs

🤖 Model 🖼️ → 📝

cjwbw/cogagent-chat

Answer questions about images and GUI screenshots. Takes an image and a natural-language query and returns a text respon...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 2.3K runs

🤖 Model 🖼️ → 📝

lucataco/qwen-vl-chat

Analyzes images and answers questions about them through conversational interaction. Takes an image and a text prompt as...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 826.4K runs

🤖 Model 🖼️ → 📝

zsxkib/uform-gen

Generates text captions and answers questions about images using a fast 1.5B parameter multimodal language model. Takes...

🖼️ → 📝 • image-to-text • text-generation • image-captioning • 2.4K runs

🤖 Model 🖼️ → 📝

chenxwh/cogvlm2

Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 6.6K runs

🤖 Model 🖼️ → 📝

hexiaochun/minicpm_v26

Generate text descriptions and answers from an input image or video. Accept an optional instruction or question prompt t...

🖼️ → 📝 • image-to-text • video-to-text • visual-question-answering • 518 runs

🤖 Model 🖼️ → 📝

lucataco/idefics-9b

Answer questions about an input image from a text prompt, returning text. Generate image captions and short visual descr...

🖼️ → 📝 • image-to-text • visual-question-answering • image-captioning • 2.1K runs