image-to-text AI Models - Page 8

lucataco/interactiveomni-8b

Processes multiple inputs simultaneously including images, audio, text, and video to generate coherent text and speech r...

📝 → 🔊 • text-generation • image-to-text • video-to-text • 86 runs

🤖 Model 🖼️ → 📝

eiby777/olmocr-2-7b-1025-fp8

Extract text and tables from document images or PDFs. Accepts an image or a selected PDF page and returns structured tex...

🖼️ → 📝 • ocr • image-to-text • 10 runs

🤖 Model 🖼️ → 📝

perceptron-ai-inc/isaac-0.1

Analyzes images and answers questions about visual content with spatially-aware responses. Takes an image and a text pro...

🖼️ → 📝 • image-to-text • visual-understanding • ocr • 39.1K runs

🤖 Model 🖼️ → 📝

openai/gpt-5.1

Generate text responses from prompts or conversations with configurable reasoning effort and verbosity. Designed specifi...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 244.0K runs

🤖 Model 🖼️ → 📝

ell-hol/clip-interrogator

Generate descriptive prompts for text-to-image models from a single image. Outputs a CLIP Interrogator-style prompt stri...

🖼️ → 📝 • image-to-text • prompt-generation • 652 runs

🤖 Model 🖼️ → 📝

asppj/openclip

Caption images. Takes an image as input and outputs a short natural-language description (image-to-text) using OpenCLIP...

🖼️ → 📝 • image-to-text • 90 runs

🤖 Model 🖼️ → 📝

google/gemini-3-pro

Generates text responses from prompts with advanced reasoning capabilities, supporting multimodal inputs including image...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 1.2M runs

🤖 Model 🖼️ → 📝

zsxkib/molmo-7b

Answers questions and generates captions about images using a 7B parameter vision-language model. Based on Qwen2-7B and...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 1.3M runs

🤖 Model 🖼️ → 📝

openai/gpt-5.2

Generate text responses with advanced reasoning capabilities for professional knowledge work, coding, and agentic tasks....

🖼️ → 📝 • text-generation • code-generation • image-to-text • 811.9K runs

🤖 Model 🖼️ → 📝

cjwbw/unival

Caption images, videos, and audio; answer media-grounded questions; and localize referred objects via visual grounding....

🖼️ → 📝 • image-to-text • video-to-text • audio-to-text • 996 runs

🤖 Model 🖼️ → 📝

moonshotai/kimi-k2.5

Analyze images and text to generate answers, working code, and polished documents. Takes a text prompt with an optional...

🖼️ → 📝 • text-generation • image-to-text • image-analysis • 7 runs

🤖 Model 🖼️ → 📝

seeghost1019/google-medgemma-4b

Answer medical questions from text and optional medical images, returning explanatory text. Accept a prompt and optional...

🖼️ → 📝 • image-to-text • text-generation • medical • 4 runs

🤖 Model 🖼️ → 📝

google/gemini-3-flash

Generates text responses from text prompts with support for multimodal inputs including images, videos, and audio. Combi...

🖼️ → 📝 • text-generation • image-to-text • video-to-text • 4.1M runs

🤖 Model 🖼️ → 📝

google/gemini-3.1-pro

Advanced multimodal language model that processes text, images, videos, and audio to generate text responses. Features t...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 574.6K runs

🤖 Model 🖼️ → 📝

openai/gpt-5.4

Generate text from prompts with configurable reasoning effort and verbosity for complex professional work, coding, and m...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 121.5K runs

🤖 Model 🖼️ → 📝

anthropic/claude-opus-4.6

Generate text and analyze images with Anthropic's most advanced language model, featuring state-of-the-art coding, reaso...

🖼️ → 📝 • text-generation • image-to-text • code-generation • 167.9K runs

🤖 Model 🖼️ → 📝

lucataco/idefics-9b

Answer questions about an input image from a text prompt, returning text. Generate image captions and short visual descr...

🖼️ → 📝 • image-to-text • visual-question-answering • image-captioning • 2.1K runs

🤖 Model 🖼️ → 📝

anthropic/claude-opus-4.7

Generate text responses with advanced reasoning and visual understanding capabilities from text prompts and optional ima...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 90.2K runs

🤖 Model 🖼️ → 📝

moonshotai/kimi-k2.6

Generate text, code, and engage in multi-modal conversations using Moonshot AI's 1 trillion parameter frontier model wit...

🖼️ → 📝 • text-generation • code-generation • image-to-text • 21.7K runs

🤖 Model 🖼️ → 📝

seeghost1019/google-medgemma-4b-non-clinical

Generates text responses to medical questions and analyzes medical images for research and educational purposes. Based o...

🖼️ → 📝 • text-generation • image-to-text • medical • 94 runs