visual-understanding AI Models

lucataco/qwen-vl-chat

Analyzes images and answers questions about them through conversational interaction. Takes an image and a text prompt as...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 826.4K runs

🤖 Model 🖼️

camenduru/moe-llava

Analyze images to identify unusual or noteworthy elements based on textual prompts. This model processes an input image...

🖼️ • image-analysis • visual-understanding • image-captioning • 1.4M runs

🤖 Model 🖼️ → 📝

cuuupid/glm-4v-9b

Generates text responses to questions about images with multimodal understanding capabilities. Takes an image and text p...

🖼️ → 📝 • image-to-text • ocr • text-generation • 93.8K runs

🤖 Model 🖼️ → 📝

yorickvp/llava-v1.6-vicuna-13b

Analyze images and answer questions about them in natural language. Accepts a text prompt and an optional image and retu...

🖼️ → 📝 • image-to-text • text-generation • image-analysis • 3.7M runs

🤖 Model 🖼️ → 📝

lucataco/minicpm-v-4

Analyze images and videos with text prompts to generate detailed text responses. Handles single images, multiple images,...

🖼️ → 📝 • image-to-text • video-to-text • visual-understanding • 795 runs

🤖 Model 🖼️ → 📝

lucataco/ollama-llama3.2-vision-90b

Generates text responses based on image and text inputs using Meta's Llama 3.2-Vision 90B multimodal language model. Per...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 4.6K runs

🤖 Model 🖼️ → 📝

lucataco/moondream1

Answer questions about images. Takes an image and a text prompt, and returns a text answer, enabling visual question ans...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 11.5K runs

🤖 Model 🖼️ → 📝

lucataco/ollama-llama3.2-vision-11b

Generates text responses based on both text prompts and images using Meta's Llama 3.2 Vision 11B model. Analyzes and und...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 9.9K runs

🤖 Model 🖼️ → 📝

bytedance/sa2va-8b-image

Analyzes images with text instructions to provide visual understanding and object segmentation. Combines SAM2 segmentati...

🖼️ → 📝 • image-to-text • image-segmentation • visual-understanding • 48.3K runs

🤖 Model 🖼️ → 📝

openai/o4-mini

Generate text responses with advanced reasoning capabilities, specializing in math, coding, and visual analysis. Process...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 470.2K runs

🤖 Model 🖼️ → 📝

openai/gpt-5

Generate text responses from prompts with advanced reasoning, code generation, and image analysis capabilities. Supports...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 2.2M runs

🤖 Model 🖼️ → 📝

anthropic/claude-4-sonnet

Generate text content from prompts with advanced reasoning and coding capabilities. Claude Sonnet 4 supports both standa...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 3.0M runs

🤖 Model 🖼️ → 📝

cjwbw/cogagent-chat

Answer questions about images and GUI screenshots. Takes an image and a natural-language query and returns a text respon...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 2.3K runs

🤖 Model 🖼️ → 📝

openai/gpt-5-nano

Generates text responses based on prompts or conversation messages, with support for image input analysis. This is the f...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 13.9M runs

🤖 Model 🖼️ → 📝

openai/gpt-4.1-mini

Generate text responses from prompts with support for image analysis and visual understanding. Fast, lightweight languag...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 2.6M runs

🤖 Model 🖼️ → 📝

openai/gpt-5-mini

Generates text responses based on prompts or multi-turn conversations, designed as a faster and more cost-effective vers...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 2.3M runs

🤖 Model 🖼️ → 📝

openai/gpt-4o-mini

Generates text responses from prompts using OpenAI's GPT-4o mini model with low latency and cost optimization. Supports...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 39.3M runs

🤖 Model 🖼️ → 📝

openai/gpt-4o

Generates text responses from text prompts, messages, and images with multimodal capabilities. Processes both text and v...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 723.2K runs

🤖 Model 🖼️ → 📝

openai/o1

Generate text responses with advanced reasoning capabilities, specializing in complex problem-solving across mathematics...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 18.7K runs

🤖 Model 🖼️ → 📝

anthropic/claude-3.7-sonnet

Generate text responses based on prompts with support for image analysis. Features particularly strong capabilities in c...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 4.1M runs