image-to-text AI Models - Cloudernative

spuuntries/urna-kp3l

Caption images and answer visual questions from an image and a text prompt. Accepts an input image and an instruction (e...

🖼️ → 📝 • image-to-text • image-captioning • visual-question-answering • 108 runs

🤖 Model 📝 → 🖼️

bytedance/bagel

Generate images from text, edit existing images with natural-language instructions, and answer questions about images. T...

📝 → 🖼️ • text-to-image • image-editing • image-to-text • 150.4K runs

🤖 Model 🖼️ → 📝

bytedance/sa2va-26b-image

Segment objects in images from natural-language instructions and answer visual questions. Provide an image plus a text i...

🖼️ → 📝 • image-segmentation • image-to-text • 6.4K runs

🤖 Model 🖼️ → 📝

bytedance/sa2va-8b-image

Analyzes images with text instructions to provide visual understanding and object segmentation. Combines SAM2 segmentati...

🖼️ → 📝 • image-to-text • image-segmentation • visual-understanding • 48.3K runs

🤖 Model 🖼️

bytedance/sa2va-4b-image

Segment objects and regions in images using natural language instructions. Accepts an image and a text instruction and r...

🖼️ • image-segmentation • visual-grounding • referring-segmentation • 132 runs

🤖 Model 🖼️ → 📝

lucataco/qwen2-vl-7b-instruct

Analyze images and videos with text prompts to generate detailed descriptions, answer questions, and extract information...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 547.2K runs

🤖 Model 🖼️ → 📝

nomagick/qwen-vl-chat

Generates text responses based on text prompts and images with ChatML prompt interface and streaming support. Accepts up...

🖼️ → 📝 • text-generation • image-to-text • image-analysis • 1.1K runs

🤖 Model 🖼️ → 📝

ibm-granite/granite-vision-3.2-2b

Extract content and answer questions from images of documents. Takes an image plus a text prompt or question and outputs...

🖼️ → 📝 • image-to-text • ocr • 118.0K runs

🤖 Model 🖼️ → 📝

salesforce/blip

Generate image captions, answer questions about images, or match images with text descriptions. Supports three main task...

🖼️ → 📝 • image-to-text • visual-question-answering • 173.0M runs

🤖 Model 🖼️ → 📝

andreasjansson/blip-2

Answers questions about images and generates image captions. Takes an image and a text question as input, returning a te...

🖼️ → 📝 • image-to-text • visual-understanding • 31.8M runs

🤖 Model 🖼️ → 📝

zsxkib/blip-3

Answers questions about images and generates image captions using BLIP-3/XGen-MM multimodal model. Takes an image and a...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 1.3M runs

🤖 Model 🖼️ → 📝

cuuupid/glm-4v-9b

Generates text responses to questions about images with multimodal understanding capabilities. Takes an image and text p...

🖼️ → 📝 • image-to-text • ocr • text-generation • 93.8K runs

🤖 Model 🖼️ → 📝

pandas9/joytag

Generate booru-style tags from an input image. Extracts multi-label, Danbooru-style keywords covering subjects, attribut...

🖼️ → 📝 • image-to-text • image-tagging • 18.6K runs

🤖 Model 🖼️ → 📝

lucataco/moondream1

Answer questions about images. Takes an image and a text prompt, and returns a text answer, enabling visual question ans...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 11.5K runs

🤖 Model 🖼️ → 📝

openai/gpt-4.1-nano

Generates text responses from prompts with ultra-low latency and fast response times. Supports up to 1 million token con...

🖼️ → 📝 • text-generation • image-to-text • text-embedding • 2.3M runs

🤖 Model 🖼️ → 📝

openai/gpt-5-nano

Generates text responses based on prompts or conversation messages, with support for image input analysis. This is the f...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 13.9M runs

🤖 Model 🖼️ → 📝

yorickvp/llava-13b

Analyzes images and answers questions about them through conversational text generation. Combines visual understanding w...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 35.8M runs

🤖 Model 🖼️ → 📝

lucataco/moondream2

Analyzes images and generates text descriptions based on visual content and optional prompts. This small vision language...

🖼️ → 📝 • image-to-text • visual-understanding • ocr • 13.1M runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl2

Analyzes images and answers questions about visual content using a Mixture-of-Experts architecture. Takes an image and t...

🖼️ → 📝 • image-to-text • ocr • text-generation • 98.6K runs

🤖 Model 🖼️ → 📝

sai88uk/minicpm-v-45-v9

Analyzes single images, multiple images, and high-FPS videos to answer questions about their content. Supports controlla...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 14.3K runs