image-captioning AI Models - Page 4

sai88uk/minicpm-v-45-v9

Analyzes single images, multiple images, and high-FPS videos to answer questions about their content. Supports controlla...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 14.3K runs

🤖 Model 📝 → 📝

openai/gpt-5-pro

Generates text responses with built-in reasoning capabilities for complex problem-solving and expert-level analysis. Sup...

📝 → 📝 • text-generation • image-analysis • visual-understanding • 4.8K runs

🤖 Model 🖼️ → 📝

baaivision/emu3-chat

Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...

🖼️ → 📝 • image-to-text • image-captioning • image-analysis • 28 runs

🤖 Model 🖼️ → 📝

google-deepmind/gemma-3-12b-it

Generates text responses from text prompts and optional image inputs. Supports multimodal capabilities for analyzing and...

🖼️ → 📝 • text-generation • image-to-text • question-answering • 25.2K runs

🤖 Model 🖼️ → 📝

smartinezbragado/salesforce-blip2

Caption images and answer visual questions from an input image. Provide an image and either generate a caption or ask a...

🖼️ → 📝 • image-to-text • visual-question-answering • 967 runs

🤖 Model 🖼️ → 📝

pku-yuangroup/llava-cot

Answer questions about images with step-by-step reasoning. Take an image and an optional text prompt and output text, in...

🖼️ → 📝 • image-to-text • image-captioning • image-analysis • 46 runs

🤖 Model 🖼️ → 📝

jimothyjohn/phi3-vision-instruct

Analyze images and answer questions about them using text prompts. Built on Microsoft's Phi-3.5-Vision-Instruct, a light...

🖼️ → 📝 • image-to-text • visual-understanding • text-generation • 208 runs

🤖 Model 🖼️ → 📝

openai/gpt-5-structured

Generate text content with structured outputs, web search capabilities, and custom tools based on text prompts and image...

🖼️ → 📝 • text-generation • image-to-text • document-to-json • 538.1K runs

🤖 Model 📝 → 📝

hayooucom/vision-model2

Generates detailed textual descriptions of images based on input prompts. Utilizes a vision-language model to analyze an...

📝 → 📝 • image-captioning • vision-language • text-generation • 346 runs

🤖 Model 🖼️ → 📝

justmalhar/meta-llama-3.2-11b-vision

Analyze images and generate text responses based on visual content and text prompts. Based on Meta's Llama 3.2 11B visio...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 312 runs

🤖 Model 🖼️ → 📝

yoadtew/arithmetic

Generates descriptive text captions from three input images using arithmetic operations on image features. The model com...

🖼️ → 📝 • image-to-text • image-captioning • visual-understanding • 94 runs

🤖 Model 🖼️ → 📝

rmokady/clip_prefix_caption

Generate text captions describing the content of input images. Uses CLIP visual encoding combined with GPT-2 language ge...

🖼️ → 📝 • image-to-text • 1.7M runs

🤖 Model 🖼️ → 📝

yoadtew/zero-shot-image-to-text

Generate text captions from images using a zero-shot approach. Takes an image input and produces descriptive text output...

🖼️ → 📝 • image-to-text • 6.7K runs

🤖 Model 🖼️ → 📝

openai/gpt-4o-mini

Generates text responses from prompts using OpenAI's GPT-4o mini model with low latency and cost optimization. Supports...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 39.3M runs

🤖 Model 🖼️ → 📝

openai/gpt-4.1

Generate text responses for complex tasks with 1 million token context window and multimodal capabilities. Features impr...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 335.9K runs

🤖 Model 🖼️

lidarbtc/kollava-v1.5

Answer questions about images in Korean. Take an image and a Korean prompt and generate Korean text for visual question...

🖼️ • image-captioning • image-analysis • visual-understanding • 66 runs

🤖 Model 🖼️ → 📝

google-deepmind/gemma-3-27b-it

Generates text based on text prompts and optional image inputs. Handles multimodal tasks combining text and image analys...

🖼️ → 📝 • text-generation • image-to-text • question-answering • 36.4K runs

🤖 Model 🖼️ → 📝

lucataco/internvl3_5-30b

Analyze images or video and generate text captions, answers, and summaries. Accepts single or multiple images or a video...

🖼️ → 📝 • image-to-text • video-to-text • 63 runs

🤖 Model 🖼️ → 📝

zsxkib/uform-gen

Generates text captions and answers questions about images using a fast 1.5B parameter multimodal language model. Takes...

🖼️ → 📝 • image-to-text • text-generation • image-captioning • 2.4K runs

🤖 Model 🖼️ → 📝

lucataco/qwen3-vl-8b-instruct

Analyze images and videos to generate detailed text descriptions and answers to questions. Supports both image and video...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 93.6K runs