visual-question-answering AI Models - Page 2

lidarbtc/kollava-v1.5

Answer questions about images in Korean. Take an image and a Korean prompt and generate Korean text for visual question...

🖼️ • image-captioning • image-analysis • visual-understanding • 66 runs

🤖 Model 🖼️ → 📝

lucataco/blip3-phi3-mini-instruct-r-v1

Answer questions about images and generate captions from an image input and a natural-language question, returning text....

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 399 runs

🤖 Model 🖼️ → 📝

smartinezbragado/salesforce-blip2

Caption images and answer visual questions from an input image. Provide an image and either generate a caption or ask a...

🖼️ → 📝 • image-to-text • visual-question-answering • 967 runs

🤖 Model 🖼️ → 📝

adirik/bunny-phi-2-siglip

Answers questions about images using natural language prompts. Built on SigLIP and Phi-2, this lightweight multimodal mo...

🖼️ → 📝 • image-to-text • text-generation • 7.9K runs

🤖 Model 🖼️ → 📝

chigozienri/llava-birds

Identify bird species and answer bird-related questions from an input image and text prompt, returning text. Perform vis...

🖼️ → 📝 • image-to-text • visual-question-answering • bird-identification • 74 runs

🤖 Model 🖼️ → 📝

daanelson/minigpt-4

Generates text descriptions, stories, and responses based on input images and prompts. Takes an image and text prompt as...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 1.8M runs

🤖 Model 🖼️ → 📝

muqtadar08/llava_phi-3-mini

Analyzes images and answers questions about their content using the LLaVA Phi-3 Mini vision-language model. Takes an ima...

🖼️ → 📝 • image-to-text • visual-understanding • question-answering • 15 runs

🤖 Model 🖼️ → 📝

nelsonjchen/minigpt-4_vicuna-13b

Answer questions about images and generate detailed image captions using MiniGPT-4 with Vicuna-13B language model. Takes...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 52.0K runs

🤖 Model 🖼️ → 📝

nelsonjchen/minigpt-4_vicuna-7b

Analyzes images and answers questions about them using MiniGPT-4 with Vicuna-7B language model. Takes an image and an op...

🖼️ → 📝 • image-to-text • image-captioning • 9.9K runs

🤖 Model 🖼️ → 📝

lucataco/paligemma-3b-pt-224

Analyzes images and generates text responses based on prompts and visual content. Built on Google's PaliGemma 3B archite...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 4.1K runs

🤖 Model 🖼️ → 📝

lucataco/moondream-0.5b

Answer questions about images and generate captions from an image input. Takes an image and a text prompt (e.g., “Descri...

🖼️ → 📝 • image-to-text • visual-question-answering • image-captioning • 64 runs

🤖 Model 🖼️ → 📝

jyoung105/moondream

Answer questions about images from a text prompt, returning a text response. Accepts an input image and a prompt and out...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 310 runs

🤖 Model 🖼️ → 📝

lucataco/fuyu-8b

Generates text responses based on image and text prompts using a multi-modal transformer architecture. Takes an image an...

🖼️ → 📝 • image-to-text • text-generation • 14.7K runs

🤖 Model 🖼️ → 📝

ignaciosgithub/pllava

Answer questions about images from an image and a text prompt, returning text. Generate captions, short answers, and exp...

🖼️ → 📝 • image-to-text • visual-question-answering • 298 runs

🤖 Model 🖼️ → 📝

cjwbw/pix2struct

Analyzes images and answers questions or generates captions based on textual prompts. Provides six specialized models tr...

🖼️ → 📝 • image-to-text • visual-question-answering • ocr • 6.1K runs

🤖 Model 🖼️ → 📝

chenxwh/deepseek-vl2

Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Processes sin...

🖼️ → 📝 • image-to-text • ocr • visual-understanding • 1.1K runs

🤖 Model 🖼️ → 📝

lucataco/nemotron-nano-vl-8b-v1

Analyzes images and answers questions about their content using a document intelligence vision language model. Takes an...

🖼️ → 📝 • image-to-text • document-to-json • ocr • 17 runs

🤖 Model 🖼️ → 📝

cjwbw/uform-gen2-qwen-500m

Analyzes images and generates text responses to questions or instructions about visual content. This compact multimodal...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 413 runs

🤖 Model 🖼️ → 📝

lucataco/llama-3-vision-alpha

Analyzes images and generates text descriptions or answers questions about visual content. Uses a projection module trai...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 6.8K runs

🤖 Model 🖼️ → 📝

ibm-granite/granite-vision-3.3-2b

Analyze documents and images from one or more image inputs plus a text prompt, returning text captions, OCR, and answers...

🖼️ → 📝 • image-to-text • ocr • image-captioning • 42.1K runs