image-to-text AI Models - Page 4

aodianyun/minicpm-v-26

Analyzes images and videos with text prompts, providing detailed visual understanding and question answering capabilitie...

🖼️ → 📝 • image-to-text • visual-understanding • 16 runs

🤖 Model 🖼️ → 📝

hexiaochun/minicpm_v26

Generate text descriptions and answers from an input image or video. Accept an optional instruction or question prompt t...

🖼️ → 📝 • image-to-text • video-to-text • visual-question-answering • 518 runs

🤖 Model 🖼️ → 📝

aodianyun/minicpm-v-26-int4

Generate text descriptions for images and videos. Accepts a single image or video plus an optional instruction prompt, a...

🖼️ → 📝 • image-to-text • video-to-text • 12 runs

🤖 Model 🖼️ → 📝

nelsonjchen/minigpt-4_vicuna-13b

Answer questions about images and generate detailed image captions using MiniGPT-4 with Vicuna-13B language model. Takes...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 52.0K runs

🤖 Model 🖼️ → 📝

openai/gpt-5

Generate text responses from prompts with advanced reasoning, code generation, and image analysis capabilities. Supports...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 2.2M runs

🤖 Model 📝 → 📝

openai/gpt-5-pro

Generates text responses with built-in reasoning capabilities for complex problem-solving and expert-level analysis. Sup...

📝 → 📝 • text-generation • image-analysis • visual-understanding • 4.8K runs

🤖 Model 🖼️ → 📝

anthropic/claude-4-sonnet

Generate text content from prompts with advanced reasoning and coding capabilities. Claude Sonnet 4 supports both standa...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 3.0M runs

🤖 Model 🖼️ → 📝

anthropic/claude-3.7-sonnet

Generate text responses based on prompts with support for image analysis. Features particularly strong capabilities in c...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 4.1M runs

🤖 Model 🖼️ → 📝

openai/gpt-5-structured

Generate text content with structured outputs, web search capabilities, and custom tools based on text prompts and image...

🖼️ → 📝 • text-generation • image-to-text • document-to-json • 538.1K runs

🤖 Model 🖼️ → 📝

nelsonjchen/minigpt-4_vicuna-7b

Analyzes images and answers questions about them using MiniGPT-4 with Vicuna-7B language model. Takes an image and an op...

🖼️ → 📝 • image-to-text • image-captioning • 9.9K runs

🤖 Model 🖼️ → 📝

yimi81/yi-vl-6b

Answer questions about images and generate captions from an image and a text query, returning text. Accept a single imag...

🖼️ → 📝 • image-to-text • visual-question-answering • image-captioning • 309 runs

🤖 Model 🖼️ → 📝

openai/o1

Generate text responses with advanced reasoning capabilities, specializing in complex problem-solving across mathematics...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 18.7K runs

🤖 Model 🖼️ → 📝

google-deepmind/gemma-3-4b-it

Generate text based on text prompts and optional image inputs. This multimodal language model handles both text and imag...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 13.3K runs

🤖 Model 🖼️ → 📝

google-deepmind/gemma-3-12b-it

Generates text responses from text prompts and optional image inputs. Supports multimodal capabilities for analyzing and...

🖼️ → 📝 • text-generation • image-to-text • question-answering • 25.2K runs

🤖 Model 🖼️ → 📝

google-deepmind/gemma-3-27b-it

Generates text based on text prompts and optional image inputs. Handles multimodal tasks combining text and image analys...

🖼️ → 📝 • text-generation • image-to-text • question-answering • 36.4K runs

🤖 Model 🖼️ → 📝

paragekbote/gemma3-torchao-quant-sparse

Generate text and analyze images from a text prompt (optionally with an image), returning text for conversation, caption...

🖼️ → 📝 • text-generation • image-to-text • image-captioning • 54 runs

🤖 Model 🖼️ → 📝

lucataco/paligemma-3b-pt-224

Analyzes images and generates text responses based on prompts and visual content. Built on Google's PaliGemma 3B archite...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 4.1K runs

🤖 Model 🖼️ → 📝

lucataco/moondream-0.5b

Answer questions about images and generate captions from an image input. Takes an image and a text prompt (e.g., “Descri...

🖼️ → 📝 • image-to-text • visual-question-answering • image-captioning • 64 runs

🤖 Model 🖼️ → 📝

jyoung105/imp

Answer questions about images. Takes an image and a text prompt and returns a text response, enabling visual question an...

🖼️ → 📝 • image-to-text • visual-question-answering • 71 runs

🤖 Model 🖼️ → 📝

webnizam/image-caption

Caption images. Takes an input image and returns a short natural-language description as text, useful for alt text, acce...

🖼️ → 📝 • image-to-text • image-captioning • 2.8K runs