visual-understanding AI Models - Page 2

yorickvp/llava-v1.6-mistral-7b

Multimodal language model that analyzes images and generates text responses based on visual content and text prompts. Bu...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 5.0M runs

🤖 Model 🖼️ → 📝

baaivision/emu3-chat

Analyze images and return text responses for captioning and visual question answering. Accept an image and a natural-lan...

🖼️ → 📝 • image-to-text • image-captioning • image-analysis • 28 runs

🤖 Model 📝 → 📝

lucataco/image-caption

Generate captions for images using a simple GPT-5-mini wrapper. Input an image and receive a descriptive text output tha...

📝 → 📝 • image-captioning • text-generation • visual-understanding • 0 runs

🤖 Model 🖼️ → 📝

anthropic/claude-4.5-sonnet

Generates text responses based on prompts and can analyze images. Excels at coding tasks with state-of-the-art performan...

🖼️ → 📝 • text-generation • code-generation • code-understanding • 1.4M runs

🤖 Model 🖼️ → 📝

deepseek-ai/janus-pro-1b

Analyzes images and answers questions about them using a unified autoregressive framework for multimodal understanding....

🖼️ → 📝 • image-to-text • visual-understanding • multimodal • 6.7K runs

🤖 Model 🖼️ → 📝

google-deepmind/gemma-3-4b-it

Generate text based on text prompts and optional image inputs. This multimodal language model handles both text and imag...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 13.3K runs

🤖 Model 🖼️ → 📝

nomagick/qwen-vl-chat

Generates text responses based on text prompts and images with ChatML prompt interface and streaming support. Accepts up...

🖼️ → 📝 • text-generation • image-to-text • image-analysis • 1.1K runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl2

Analyzes images and answers questions about visual content using a Mixture-of-Experts architecture. Takes an image and t...

🖼️ → 📝 • image-to-text • ocr • text-generation • 98.6K runs

🤖 Model 🖼️ → 📝

samim23/internlm-xcomposer2

Caption images and answer visual questions from a text prompt and optional image, returning text. Support long-context i...

🖼️ → 📝 • image-to-text • image-captioning • text-generation • 95 runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl2-small

Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Takes an imag...

🖼️ → 📝 • image-to-text • ocr • text-generation • 6.5K runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl-7b-base

Analyzes images and generates text descriptions or responses to prompts about visual content. Processes diverse image ty...

🖼️ → 📝 • image-to-text • visual-understanding • 6.6K runs

🤖 Model 🖼️ → 📝

cjwbw/cogvlm

Analyzes images and answers questions about them using a visual language model. Takes an image and a text query as input...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 1.5M runs

🤖 Model 🖼️ → 📝

lucataco/qvq-72b-preview

Analyzes images and answers questions about visual content with enhanced reasoning capabilities. Takes an image and text...

🖼️ → 📝 • image-to-text • visual-understanding • text-generation • 297 runs

🤖 Model 🖼️ → 📝

lucataco/magma-8b

Analyzes images and answers questions about visual content through multimodal conversation. Designed as a foundation mod...

🖼️ → 📝 • image-to-text • visual-understanding • question-answering • 910 runs

🤖 Model 🖼️ → 📝

zsxkib/molmo-7b

Answers questions and generates captions about images using a 7B parameter vision-language model. Based on Qwen2-7B and...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 1.3M runs

🤖 Model 🖼️ → 📝

lucataco/llama-3-vision-alpha

Analyzes images and generates text descriptions or answers questions about visual content. Uses a projection module trai...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 6.8K runs

🤖 Model 🖼️ → 📝

yorickvp/llava-13b

Analyzes images and answers questions about them through conversational text generation. Combines visual understanding w...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 35.8M runs

🤖 Model 🖼️ → 📝

google-deepmind/gemma-3-12b-it

Generates text responses from text prompts and optional image inputs. Supports multimodal capabilities for analyzing and...

🖼️ → 📝 • text-generation • image-to-text • question-answering • 25.2K runs

🤖 Model 🖼️ → 📝

lucataco/smolvlm-instruct

Analyzes images and generates text responses based on visual content and text prompts. Accepts arbitrary sequences of im...

🖼️ → 📝 • image-to-text • visual-understanding • document-understanding • 8.3K runs

🤖 Model 🖼️ → 📝

chenxwh/cogvlm2

Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 6.6K runs