image-analysis AI Models - Page 4

nelsonjchen/minigpt-4_vicuna-13b

Answer questions about images and generate detailed image captions using MiniGPT-4 with Vicuna-13B language model. Takes...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 52.0K runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl2

Analyzes images and answers questions about visual content using a Mixture-of-Experts architecture. Takes an image and t...

🖼️ → 📝 • image-to-text • ocr • text-generation • 98.6K runs

🤖 Model 🖼️

lidarbtc/kollava-v1.5

Answer questions about images in Korean. Take an image and a Korean prompt and generate Korean text for visual question...

🖼️ • image-captioning • image-analysis • visual-understanding • 66 runs

🤖 Model 🖼️ → 📝

lucataco/internvl3_5-30b

Analyze images or video and generate text captions, answers, and summaries. Accepts single or multiple images or a video...

🖼️ → 📝 • image-to-text • video-to-text • 63 runs

🤖 Model 🖼️ → 📝

lucataco/qwen3-vl-8b-instruct

Analyze images and videos to generate detailed text descriptions and answers to questions. Supports both image and video...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 93.6K runs

🤖 Model 🖼️ → 📝

zsxkib/idefics3

Answers questions about images and generates detailed captions based on visual content and text prompts. Processes both...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 2.7K runs

🤖 Model 🖼️ → 📝

lucataco/fuyu-8b

Generates text responses based on image and text prompts using a multi-modal transformer architecture. Takes an image an...

🖼️ → 📝 • image-to-text • text-generation • 14.7K runs

🤖 Model 🖼️

arielreplicate/gscorecam-clip-analyzer

Visualize which regions of an image CLIP associates with a given text prompt. Generates a saliency heatmap, optionally o...

🖼️ • image-analysis • saliency-map • clip • 759 runs

🤖 Model 🖼️ → 📝

google/gemini-3-pro

Generates text responses from prompts with advanced reasoning capabilities, supporting multimodal inputs including image...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 1.2M runs

🤖 Model 🖼️ → 📝

google/gemini-2.5-flash

Generate text responses from text, image, video, and audio inputs with controllable reasoning depth. Supports up to 1 mi...

🖼️ → 📝 • text-generation • image-to-text • video-to-text • 6.8M runs

🤖 Model 🖼️

cjwbw/internlm-xcomposer

Answer questions and caption images from a text prompt and an optional image, returning text. Generate long-form text an...

🖼️ • image-captioning • image-analysis • visual-understanding • 164.4K runs

🤖 Model 🖼️

adirik/lightweight-openpose

Estimate 2D poses of multiple people in an image using a lightweight version of OpenPose. Outputs include 18 keypoints p...

🖼️ • pose-estimation • multi-person • 2d-pose • 1.6K runs

🤖 Model 📝 → 📝

hayooucom/vision-model

Analyze images and generate detailed textual descriptions based on visual content. Supports input via image URLs or base...

📝 → 📝 • image-analysis • image-captioning • visual-understanding • 15.4K runs

🤖 Model 📝 → 📝

microsoft/phi-4-multimodal-instruct

Generate text responses from text, image, and audio inputs. Perform image captioning and visual question answering, OCR,...

📝 → 📝 • text-generation • speech-to-text • image-captioning • 13.2K runs

🤖 Model 🖼️ → 📝

moonshotai/kimi-k2.5

Analyze images and text to generate answers, working code, and polished documents. Takes a text prompt with an optional...

🖼️ → 📝 • text-generation • image-to-text • image-analysis • 7 runs

🤖 Model 🖼️ → 📝

seeghost1019/google-medgemma-4b-non-clinical

Generates text responses to medical questions and analyzes medical images for research and educational purposes. Based o...

🖼️ → 📝 • text-generation • image-to-text • medical • 94 runs

🤖 Model 🖼️ → 📝

openai/gpt-5.4

Generate text from prompts with configurable reasoning effort and verbosity for complex professional work, coding, and m...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 121.5K runs

🤖 Model 🖼️ → 📝

anthropic/claude-opus-4.6

Generate text and analyze images with Anthropic's most advanced language model, featuring state-of-the-art coding, reaso...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 167.9K runs

🤖 Model 🖼️ → 📝

google/gemini-3.1-pro

Advanced multimodal language model that processes text, images, videos, and audio to generate text responses. Features t...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 574.6K runs

🤖 Model 🖼️ → 📝

anthropic/claude-opus-4.7

Generate text responses with advanced reasoning and visual understanding capabilities from text prompts and optional ima...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 90.2K runs