image-captioning AI Models - Page 3

cuuupid/glm-4v-9b

Generates text responses to questions about images with multimodal understanding capabilities. Takes an image and text p...

🖼️ → 📝 • image-to-text • ocr • text-generation • 93.8K runs

🤖 Model 🖼️ → 📝

samim23/internlm-xcomposer2

Caption images and answer visual questions from a text prompt and optional image, returning text. Support long-context i...

🖼️ → 📝 • image-to-text • image-captioning • text-generation • 95 runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl2-small

Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Takes an imag...

🖼️ → 📝 • image-to-text • ocr • text-generation • 6.5K runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl-7b-base

Analyzes images and generates text descriptions or responses to prompts about visual content. Processes diverse image ty...

🖼️ → 📝 • image-to-text • visual-understanding • 6.6K runs

🤖 Model 🖼️ → 📝

openai/gpt-4o

Generates text responses from text prompts, messages, and images with multimodal capabilities. Processes both text and v...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 723.2K runs

🤖 Model 🖼️ → 📝

muqtadar08/llava_phi-3-mini

Analyzes images and answers questions about their content using the LLaVA Phi-3 Mini vision-language model. Takes an ima...

🖼️ → 📝 • image-to-text • visual-understanding • question-answering • 15 runs

🤖 Model 🖼️ → 📝

nohamoamary/image-description-base-model

Caption images. Takes a single image as input and returns a concise natural-language description of the scene, objects,...

🖼️ → 📝 • image-to-text • image-captioning • 1.3K runs

🤖 Model 🖼️ → 📝

lucataco/nemotron-nano-vl-8b-v1

Analyzes images and answers questions about their content using a document intelligence vision language model. Takes an...

🖼️ → 📝 • image-to-text • document-to-json • ocr • 17 runs

🤖 Model 🖼️ → 📝

salesforce/blip

Generate image captions, answer questions about images, or match images with text descriptions. Supports three main task...

🖼️ → 📝 • image-to-text • visual-question-answering • 173.0M runs

🤖 Model 🖼️ → 📝

cjwbw/cogvlm

Analyzes images and answers questions about them using a visual language model. Takes an image and a text query as input...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 1.5M runs

🤖 Model 🖼️ → 📝

lucataco/magma-8b

Analyzes images and answers questions about visual content through multimodal conversation. Designed as a foundation mod...

🖼️ → 📝 • image-to-text • visual-understanding • question-answering • 910 runs

🤖 Model 🖼️ → 📝

openai/o4-mini

Generate text responses with advanced reasoning capabilities, specializing in math, coding, and visual analysis. Process...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 470.2K runs

🤖 Model 🖼️ → 📝

zsxkib/molmo-7b

Answers questions and generates captions about images using a 7B parameter vision-language model. Based on Qwen2-7B and...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 1.3M runs

🤖 Model 🖼️ → 📝

cjwbw/uform-gen2-qwen-500m

Analyzes images and generates text responses to questions or instructions about visual content. This compact multimodal...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 413 runs

🤖 Model 🖼️ → 📝

google-deepmind/gemma-3-4b-it

Generate text based on text prompts and optional image inputs. This multimodal language model handles both text and imag...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 13.3K runs

🤖 Model 🖼️ → 📝

lucataco/smolvlm-instruct

Analyzes images and generates text responses based on visual content and text prompts. Accepts arbitrary sequences of im...

🖼️ → 📝 • image-to-text • visual-understanding • document-understanding • 8.3K runs

🤖 Model 🖼️ → 📝

chenxwh/cogvlm2

Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 6.6K runs

🤖 Model 📝 → 📝

anthropic/claude-4.5-haiku

Generate text for chat, Q&A, coding, and document workflows with fast, low-latency responses. Accept text prompts and op...

📝 → 📝 • text-generation • image-captioning • image-analysis • 4.0K runs

🤖 Model 🖼️ → 📝

openai/gpt-4.1-mini

Generate text responses from prompts with support for image analysis and visual understanding. Fast, lightweight languag...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 2.6M runs

🤖 Model 🖼️ → 📝

anthropic/claude-3.5-sonnet

Generate and reason over text with optional image inputs, returning text outputs. Handle long-context tasks with a 200k-...

🖼️ → 📝 • text-generation • image-to-text • visual-understanding • 578.0K runs