image-to-text AI Models - Page 5

jyoung105/moondream

Answer questions about images from a text prompt, returning a text response. Accepts an input image and a prompt and out...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 310 runs

🤖 Model 🖼️ → 📝

microsoft/omniparser-v2

Parse GUI screenshots into structured UI elements with bounding boxes and captions. Accepts an image of a desktop or mob...

🖼️ → 📝 • image-to-text • image-object-detection • ui-parsing • 185.5K runs

🤖 Model 🖼️ → 📝

shreejalmaharjan-27/google-lens-scraper

Extracts text and information from images using Google Lens functionality, taking an image as input and returning extrac...

🖼️ → 📝 • ocr • image-to-text • 47.9K runs

🤖 Model 🖼️ → 📝

lucataco/fuyu-8b

Generates text responses based on image and text prompts using a multi-modal transformer architecture. Takes an image an...

🖼️ → 📝 • image-to-text • text-generation • 14.7K runs

🤖 Model 🖼️ → 📝

jigsawstack/vocr

Extract text and structured data from images and multi-page PDFs using visual OCR and layout analysis. Accept an image o...

🖼️ → 📝 • ocr • document-to-json • image-to-text • 20 runs

🤖 Model 🖼️ → 📝

remodela-ai/recognize-anything

Recognizes and identifies objects, text, and other elements in images, returning structured information about detected i...

🖼️ → 📝 • image-to-text • image-object-detection • ocr • 234 runs

🤖 Model 🖼️ → 📝

nohamoamary/image-captioning-with-visual-attention

Generates text captions describing the content of images using an attention-based neural network trained on the Flickr8k...

🖼️ → 📝 • image-to-text • 11.3K runs

🤖 Model 🖼️ → 📝

ignaciosgithub/pllava

Answer questions about images from an image and a text prompt, returning text. Generate captions, short answers, and exp...

🖼️ → 📝 • image-to-text • visual-question-answering • 298 runs

🤖 Model 🖼️ → 📝

lucataco/kosmos-2

Caption images with grounded object localization. Take an image as input and return a brief or detailed natural-language...

🖼️ → 📝 • image-to-text • image-object-detection • 1.9K runs

🤖 Model 🖼️ → 📝

hiscodesmells/florence-2-base

Performs multiple computer vision tasks on images including captioning, object detection, OCR, and segmentation. Takes a...

🖼️ → 📝 • image-to-text • object-detection • ocr • 323 runs

🤖 Model 🖼️ → 📝

jd7h/texify

Convert images or PDFs containing mathematical notation into Markdown/LaTeX text. Accept an image input and return a tex...

🖼️ → 📝 • ocr • image-to-text • 72 runs

🤖 Model 🖼️ → 📝

cjwbw/pix2struct

Analyzes images and answers questions or generates captions based on textual prompts. Provides six specialized models tr...

🖼️ → 📝 • image-to-text • visual-question-answering • ocr • 6.1K runs

🤖 Model 🖼️ → 📝

samim23/internlm-xcomposer2

Caption images and answer visual questions from a text prompt and optional image, returning text. Support long-context i...

🖼️ → 📝 • image-to-text • image-captioning • text-generation • 95 runs

🤖 Model 🖼️ → 📝

idea-research/ram-grounded-sam

Tag and segment objects in images, returning labels, bounding boxes, and pixel masks. Accepts an image as input and outp...

🖼️ → 📝 • image-object-detection • image-segmentation • image-to-text • 1.5M runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl2-small

Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Takes an imag...

🖼️ → 📝 • image-to-text • ocr • text-generation • 6.5K runs

🤖 Model 🖼️ → 📝

chenxwh/deepseek-vl2

Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Processes sin...

🖼️ → 📝 • image-to-text • ocr • visual-understanding • 1.1K runs

🤖 Model 🖼️ → 📝

zsxkib/kimi-vl-a3b-thinking

Answer questions about images and text with multimodal reasoning. Takes a text prompt with an optional image and outputs...

🖼️ → 📝 • image-to-text • text-generation • image-analysis • 988 runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl-7b-base

Analyzes images and generates text descriptions or responses to prompts about visual content. Processes diverse image ty...

🖼️ → 📝 • image-to-text • visual-understanding • 6.6K runs

🤖 Model 🖼️ → 📝

nohamoamary/image-description-base-model

Caption images. Takes a single image as input and returns a concise natural-language description of the scene, objects,...

🖼️ → 📝 • image-to-text • image-captioning • 1.3K runs

🤖 Model 🖼️ → 📝

mopineyro/resnet_breeds_finetuned

Classify 37 dog and cat breeds from an input image, returning the predicted breed label. Uses a fine-tuned ResNet18 for...

🖼️ → 📝 • image-classification • image-to-text • 184 runs