image-to-text AI Models - Page 3

lidarbtc/kollava-v1.5

Answer questions about images in Korean. Take an image and a Korean prompt and generate Korean text for visual question...

🖼️ • image-captioning • image-analysis • visual-understanding • 66 runs

🤖 Model 🖼️ → 📝

lucataco/blip3-phi3-mini-instruct-r-v1

Answer questions about images and generate captions from an image input and a natural-language question, returning text....

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 399 runs

🤖 Model 🖼️ → 📝

smartinezbragado/salesforce-blip2

Caption images and answer visual questions from an input image. Provide an image and either generate a caption or ask a...

🖼️ → 📝 • image-to-text • visual-question-answering • 967 runs

🤖 Model 🖼️ → 📝

pharmapsychotic/clip-interrogator

Generate text prompts from an input image for use with text-to-image models. Analyze artists, mediums, and styles using...

🖼️ → 📝 • image-to-text • prompt-generation • 4.6M runs

🤖 Model

zsxkib/clip-age-predictor

Predict a person's age from an input image. Takes a photo containing a face and returns an estimated age (1–99) as text....

age-estimation • face-analysis • 218.3K runs

🤖 Model 🖼️ → 📝

adirik/bunny-phi-2-siglip

Answers questions about images using natural language prompts. Built on SigLIP and Phi-2, this lightweight multimodal mo...

🖼️ → 📝 • image-to-text • text-generation • 7.9K runs

🤖 Model 🖼️ → 📝

lucataco/clip-interrogator

Generate text prompts from an input image. Combine CLIP and BLIP to analyze the image and produce descriptive prompts op...

🖼️ → 📝 • image-to-text • prompt-generation • 123.2K runs

🤖 Model 🖼️ → 📝

openai/o4-mini

Generate text responses with advanced reasoning capabilities, specializing in math, coding, and visual analysis. Process...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 470.2K runs

🤖 Model 🖼️ → 📝

chigozienri/llava-birds

Identify bird species and answer bird-related questions from an input image and text prompt, returning text. Perform vis...

🖼️ → 📝 • image-to-text • visual-question-answering • bird-identification • 74 runs

🤖 Model 🖼️ → 📝

cjwbw/cogagent-chat

Answer questions about images and GUI screenshots. Takes an image and a natural-language query and returns a text respon...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 2.3K runs

🤖 Model 🖼️ → 📝

openai/gpt-4.1-mini

Generate text responses from prompts with support for image analysis and visual understanding. Fast, lightweight languag...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 2.6M runs

🤖 Model 🖼️ → 📝

openai/gpt-4o-mini

Generates text responses from prompts using OpenAI's GPT-4o mini model with low latency and cost optimization. Supports...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 39.3M runs

🤖 Model 🖼️ → 📝

openai/gpt-4o

Generates text responses from text prompts, messages, and images with multimodal capabilities. Processes both text and v...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 723.2K runs

🤖 Model 🖼️ → 📝

openai/gpt-4.1

Generate text responses for complex tasks with 1 million token context window and multimodal capabilities. Features impr...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 335.9K runs

🤖 Model 🖼️ → 📝

anthropic/claude-3.5-sonnet

Generate and reason over text with optional image inputs, returning text outputs. Handle long-context tasks with a 200k-...

🖼️ → 📝 • text-generation • image-to-text • visual-understanding • 578.0K runs

🤖 Model 🖼️ → 📝

yorickvp/llava-v1.6-vicuna-13b

Analyze images and answer questions about them in natural language. Accepts a text prompt and an optional image and retu...

🖼️ → 📝 • image-to-text • text-generation • image-analysis • 3.7M runs

🤖 Model 🖼️ → 📝

yorickvp/llava-v1.6-mistral-7b

Multimodal language model that analyzes images and generates text responses based on visual content and text prompts. Bu...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 5.0M runs

🤖 Model 🖼️ → 📝

baaivision/emu3-chat

Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...

🖼️ → 📝 • image-to-text • image-captioning • image-analysis • 28 runs

🤖 Model 🖼️ → 📝

daanelson/minigpt-4

Generates text descriptions, stories, and responses based on input images and prompts. Takes an image and text prompt as...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 1.8M runs

🤖 Model 🖼️ → 📝

muqtadar08/llava_phi-3-mini

Analyzes images and answers questions about their content using the LLaVA Phi-3 Mini vision-language model. Takes an ima...

🖼️ → 📝 • image-to-text • visual-understanding • question-answering • 15 runs