image-to-text AI Models - Page 6

lucataco/nemotron-nano-vl-8b-v1

Analyzes images and answers questions about their content using a document intelligence vision language model. Takes an...

🖼️ → 📝 • image-to-text • document-to-json • ocr • 17 runs

🤖 Model 🖼️ → 📝

anthropic/claude-4.5-sonnet

Generates text responses based on prompts and can analyze images. Excels at coding tasks with state-of-the-art performan...

🖼️ → 📝 • text-generation • code-generation • code-understanding • 1.4M runs

🤖 Model 🖼️ → 📝

lucataco/sdxl-clip-interrogator

Generates optimized text prompts that match a given image, specifically designed for SDXL image generation models. Takes...

🖼️ → 📝 • image-to-text • text-generation • 848.8K runs

🤖 Model 🖼️ → 📝

smoretalk/clip-interrogator-turbo

Generate detailed SDXL-ready prompts from an input image. Use a CLIP-Interrogator-based pipeline to extract artists, sty...

🖼️ → 📝 • image-to-text • prompt-generation • 3.0M runs

🤖 Model 🖼️

bfirsh/resnet

Classify images into ImageNet-1k categories. Takes a single image as input and outputs ranked class labels (WordNet syns...

🖼️ • image-classification • imagenet • 184 runs

🤖 Model 🖼️ → 📝

lucataco/magma-8b

Analyzes images and answers questions about visual content through multimodal conversation. Designed as a foundation mod...

🖼️ → 📝 • image-to-text • visual-understanding • question-answering • 910 runs

🤖 Model 🖼️ → 📝

cjwbw/uform-gen2-qwen-500m

Analyzes images and generates text responses to questions or instructions about visual content. This compact multimodal...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 413 runs

🤖 Model 🖼️ → 📝

lucataco/florence-2-base

Performs multiple vision and vision-language tasks based on text prompts. Supports image captioning with varying detail...

🖼️ → 📝 • image-to-text • object-detection • ocr • 133.5K runs

🤖 Model 🖼️ → 📝

lucataco/qwen-vl-chat

Analyzes images and answers questions about them through conversational interaction. Takes an image and a text prompt as...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 826.4K runs

🤖 Model 🖼️ → 📝

lucataco/ollama-llama3.2-vision-90b

Generates text responses based on image and text inputs using Meta's Llama 3.2-Vision 90B multimodal language model. Per...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 4.6K runs

🤖 Model 🖼️ → 📝

lucataco/florence-2-large

Analyze images to generate captions, detect objects, and extract text (OCR). Accepts an image plus a task selector and o...

🖼️ → 📝 • image-to-text • image-object-detection • ocr • 471.5K runs

🤖 Model 🖼️ → 📝

chenxwh/cogvlm2

Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 6.6K runs

🤖 Model 📝 → 📝

anthropic/claude-4.5-haiku

Generate text for chat, Q&A, coding, and document workflows with fast, low-latency responses. Accept text prompts and op...

📝 → 📝 • text-generation • image-captioning • image-analysis • 4.0K runs

🤖 Model 🖼️ → 📝

lucataco/ollama-llama3.2-vision-11b

Generates text responses based on both text prompts and images using Meta's Llama 3.2 Vision 11B model. Analyzes and und...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 9.9K runs

🤖 Model 🖼️ → 📝

yuni-eng/image-to-color

Extract dominant hex color codes and caption or answer questions about an input image. Accepts an image and an optional...

🖼️ → 📝 • palette-generation • image-to-text • 394 runs

🤖 Model

w95/tinyclick

Automate GUI interactions by predicting where to click from a screenshot and a natural-language command. Takes a GUI scr...

gui-automation • visual-grounding • 28 runs

🤖 Model 🖼️ → 📝

zsxkib/easyocr

Extract text with pixel coordinates from images and screenshots. Accepts an image and returns readable text (markdown) p...

🖼️ → 📝 • ocr • image-to-text • 104 runs

🤖 Model 🖼️ → 📝

nohamoamary/nabtah-plant-disease

Classify plant leaf images into disease categories. Takes a single image as input and returns a text label for the predi...

🖼️ → 📝 • image-to-text • plant-disease-classification • 585 runs

🤖 Model 📝 → 📝

microsoft/phi-4-multimodal-instruct

Generate text responses from text, image, and audio inputs. Perform image captioning and visual question answering, OCR,...

📝 → 📝 • text-generation • speech-to-text • image-captioning • 13.2K runs

🤖 Model 🖼️ → 📝

jyoung105/honeybee

Analyzes images and generates text responses based on both the image content and text prompts. Uses a locality-enhanced...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 30 runs