image-to-text AI Models - Page 7

pku-yuangroup/llava-cot

Answer questions about images with step-by-step reasoning. Take an image and an optional text prompt and output text, in...

🖼️ → 📝 • image-to-text • image-captioning • image-analysis • 46 runs

🤖 Model 🖼️ → 📝

jimothyjohn/phi3-vision-instruct

Analyze images and answer questions about them using text prompts. Built on Microsoft's Phi-3.5-Vision-Instruct, a light...

🖼️ → 📝 • image-to-text • visual-understanding • text-generation • 208 runs

🤖 Model 🖼️ → 📝

justmalhar/meta-llama-3.2-11b-vision

Analyze images and generate text responses based on visual content and text prompts. Based on Meta's Llama 3.2 11B visio...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 312 runs

🤖 Model

datalab-to/ocr

Extract text from images and documents in 90+ languages with OCR, returning plain text plus optional structured layout....

ocr • layout-analysis • table-recognition • 426 runs

🤖 Model 🖼️ → 📝

openai/gpt-5-mini

Generates text responses based on prompts or multi-turn conversations, designed as a faster and more cost-effective vers...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 2.3M runs

🤖 Model 🖼️ → 📝

yoadtew/arithmetic

Generates descriptive text captions from three input images using arithmetic operations on image features. The model com...

🖼️ → 📝 • image-to-text • image-captioning • visual-understanding • 94 runs

🤖 Model 🖼️ → 📝

replicate/resnet

Classify images into categories. Accepts a single image and returns top predicted classes with probabilities using a Res...

🖼️ → 📝 • image-to-text • image-classification • 8.8K runs

🤖 Model

lucataco/deepseek-ocr

Converts images containing documents, PDFs, charts, and handwritten text into structured markdown while preserving forma...

ocr • pdf-to-markdown • document-to-json • 93.6K runs

🤖 Model 🖼️

lhungting-ship-it/id-card-nationality

Classify the nationality of a national ID card from an image. Accepts a photo or scan of an ID document and returns the...

🖼️ • image-classification • document-classification • id-card • 11 runs

🤖 Model 🖼️ → 📝

rmokady/clip_prefix_caption

Generate text captions describing the content of input images. Uses CLIP visual encoding combined with GPT-2 language ge...

🖼️ → 📝 • image-to-text • 1.7M runs

🤖 Model 🖼️ → 📝

yoadtew/zero-shot-image-to-text

Generate text captions from images using a zero-shot approach. Takes an image input and produces descriptive text output...

🖼️ → 📝 • image-to-text • 6.7K runs

🤖 Model 🖼️ → 📝

google/gemini-2.5-flash

Generate text responses from text, image, video, and audio inputs with controllable reasoning depth. Supports up to 1 mi...

🖼️ → 📝 • text-generation • image-to-text • video-to-text • 6.8M runs

🤖 Model 🖼️

nateraw/vit-age-classifier

Classify age range from an input image. Accepts a single image and predicts a discrete age-range label with a confidence...

🖼️ • image-classification • age-estimation • 901 runs

🤖 Model 🖼️ → 📝

lucataco/internvl3_5-30b

Analyze images or video and generate text captions, answers, and summaries. Accepts single or multiple images or a video...

🖼️ → 📝 • image-to-text • video-to-text • 63 runs

🤖 Model 🖼️ → 📝

nvidia/nemotron-nano-v2-12b-vl

Analyzes images and videos to answer questions, extract data, and provide detailed descriptions. Supports processing up...

🖼️ → 📝 • image-to-text • video-to-text • document-to-json • 988 runs

🤖 Model 🖼️ → 📝

lucataco/qwen3-vl-8b-instruct

Analyze images and videos to generate detailed text descriptions and answers to questions. Supports both image and video...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 93.6K runs

🤖 Model 🖼️ → 📝

zsxkib/uform-gen

Generates text captions and answers questions about images using a fast 1.5B parameter multimodal language model. Takes...

🖼️ → 📝 • image-to-text • text-generation • image-captioning • 2.4K runs

🤖 Model 🖼️ → 📝

kojott/content-moderation-vision

Moderate images for safety and policy compliance. Takes an image input (optionally a custom prompt) and outputs structur...

🖼️ → 📝 • image-nsfw-detection • image-to-text • content-moderation • 565 runs

🤖 Model 🖼️ → 📝

lucataco/joy-caption-pre-alpha

Caption images by generating detailed, paragraph-length natural-language descriptions from a single image input. Outputs...

🖼️ → 📝 • image-to-text • 430 runs

🤖 Model 🖼️ → 📝

ghostljj/deepseek-ocr

Extract text and convert documents to markdown format from images using optical character recognition. Supports multiple...

🖼️ → 📝 • ocr • pdf-to-markdown • document-to-json • 92 runs