image-captioning AI Models

spuuntries/urna-kp3l

Caption images and answer visual questions from an image and a text prompt. Accepts an input image and an instruction (e...

🖼️ → 📝 • image-to-text • image-captioning • visual-question-answering • 108 runs

🤖 Model 🖼️ → 📝

lucataco/qwen-vl-chat

Analyzes images and answers questions about them through conversational interaction. Takes an image and a text prompt as...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 826.4K runs

🤖 Model 🖼️ → 📝

andreasjansson/blip-2

Answers questions about images and generates image captions. Takes an image and a text question as input, returning a te...

🖼️ → 📝 • image-to-text • visual-understanding • 31.8M runs

🤖 Model 🖼️ → 📝

zsxkib/blip-3

Answers questions about images and generates image captions using BLIP-3/XGen-MM multimodal model. Takes an image and a...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 1.3M runs

🤖 Model 🖼️ → 📝

yorickvp/llava-13b

Analyzes images and answers questions about them through conversational text generation. Combines visual understanding w...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 35.8M runs

🤖 Model 🖼️ → 📝

lucataco/moondream2

Analyzes images and generates text descriptions based on visual content and optional prompts. This small vision language...

🖼️ → 📝 • image-to-text • visual-understanding • ocr • 13.1M runs

🤖 Model 🖼️

camenduru/moe-llava

Analyze images to identify unusual or noteworthy elements based on textual prompts. This model processes an input image...

🖼️ • image-analysis • visual-understanding • image-captioning • 1.4M runs

🤖 Model 🖼️ → 📝

yorickvp/llava-v1.6-vicuna-13b

Analyze images and answer questions about them in natural language. Accepts a text prompt and an optional image and retu...

🖼️ → 📝 • image-to-text • text-generation • image-analysis • 3.7M runs

🤖 Model 🖼️ → 📝

lucataco/bakllava

Caption images and answer questions about images. Takes an image and a text prompt as input and returns text, enabling i...

🖼️ → 📝 • image-to-text • visual-question-answering • visual-understanding • 39.8K runs

🤖 Model 🖼️ → 📝

lucataco/llama-3-vision-alpha

Analyzes images and generates text descriptions or answers questions about visual content. Uses a projection module trai...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 6.8K runs

🤖 Model 🖼️ → 📝

lucataco/ollama-llama3.2-vision-90b

Generates text responses based on image and text inputs using Meta's Llama 3.2-Vision 90B multimodal language model. Per...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 4.6K runs

🤖 Model 🖼️ → 📝

lucataco/ollama-llama3.2-vision-11b

Generates text responses based on both text prompts and images using Meta's Llama 3.2 Vision 11B model. Analyzes and und...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 9.9K runs

🤖 Model 🖼️ → 📝

adirik/vila-2.7b

Analyzes images and generates text responses to questions about the visual content. Takes an image and text prompt as in...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 2.6K runs

🤖 Model 🖼️ → 📝

lucataco/qvq-72b-preview

Analyzes images and answers questions about visual content with enhanced reasoning capabilities. Takes an image and text...

🖼️ → 📝 • image-to-text • visual-understanding • text-generation • 297 runs

🤖 Model 🖼️ → 📝

naklecha/cogvlm

Analyzes images and responds to text prompts about visual content. Takes an image and a text prompt as input, then gener...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 12.5K runs

🤖 Model 🖼️ → 📝

adirik/vila-7b

Answers questions about images using natural language. Takes an image and text prompt as input and generates contextual...

🖼️ → 📝 • image-to-text • visual-understanding • question-answering • 7.4K runs

🤖 Model 🖼️ → 📝

ibm-granite/granite-vision-3.3-2b

Analyze documents and images from one or more image inputs plus a text prompt, returning text captions, OCR, and answers...

🖼️ → 📝 • image-to-text • ocr • image-captioning • 42.1K runs

🤖 Model 🖼️ → 📝

zsxkib/idefics3

Answers questions about images and generates detailed captions based on visual content and text prompts. Processes both...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 2.7K runs

🤖 Model 🖼️ → 📝

adirik/bunny-phi-2-siglip

Answers questions about images using natural language prompts. Built on SigLIP and Phi-2, this lightweight multimodal mo...

🖼️ → 📝 • image-to-text • text-generation • 7.9K runs

🤖 Model 🖼️ → 📝

openai/gpt-5

Generate text responses from prompts with advanced reasoning, code generation, and image analysis capabilities. Supports...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 2.2M runs