image-to-text AI Models - Page 2

cjwbw/cogvlm

Analyzes images and answers questions about them using a visual language model. Takes an image and a text query as input...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 1.5M runs

🤖 Model 🖼️ → 📝

deepseek-ai/janus-pro-7b

Answers questions about images through multimodal understanding. Takes an image and a text question as input and generat...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 13.9K runs

🤖 Model 🖼️ → 📝

lucataco/minicpm-v-4

Analyze images and videos with text prompts to generate detailed text responses. Handles single images, multiple images,...

🖼️ → 📝 • image-to-text • video-to-text • visual-understanding • 795 runs

🤖 Model 🖼️ → 📝

lucataco/bakllava

Caption images and answer questions about images. Takes an image and a text prompt as input and returns text, enabling i...

🖼️ → 📝 • image-to-text • visual-question-answering • visual-understanding • 39.8K runs

🤖 Model 🖼️ → 📝

lucataco/llama-3-vision-alpha

Analyzes images and generates text descriptions or answers questions about visual content. Uses a projection module trai...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 6.8K runs

🤖 Model 🖼️ → 📝

adirik/vila-2.7b

Analyzes images and generates text responses to questions about the visual content. Takes an image and text prompt as in...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 2.6K runs

🤖 Model 🖼️ → 📝

lucataco/qvq-72b-preview

Analyzes images and answers questions about visual content with enhanced reasoning capabilities. Takes an image and text...

🖼️ → 📝 • image-to-text • visual-understanding • text-generation • 297 runs

🤖 Model 🖼️ → 📝

naklecha/cogvlm

Analyzes images and responds to text prompts about visual content. Takes an image and a text prompt as input, then gener...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 12.5K runs

🤖 Model 🖼️ → 📝

lucataco/idefics-8b

Accepts arbitrary sequences of image and text inputs to produce text outputs for multimodal tasks. Answers questions abo...

🖼️ → 📝 • image-to-text • visual-understanding • ocr • 1.2K runs

🤖 Model 🖼️ → 📝

adirik/vila-7b

Answers questions about images using natural language. Takes an image and text prompt as input and generates contextual...

🖼️ → 📝 • image-to-text • visual-understanding • question-answering • 7.4K runs

🤖 Model

cudanexus/ocr-surya

Extract text from images and PDFs in 90+ languages. Accept an image or multi-page PDF, a selected language list, and a p...

ocr • text-detection • 6.5K runs

🤖 Model

mickeybeurskens/latex-ocr

Extract LaTeX code from images of mathematical equations and expressions. Takes a single image as input and returns the...

ocr • latex-ocr • 871 runs

🤖 Model 🖼️ → 📝

ibm-granite/granite-vision-3.3-2b

Analyze documents and images from one or more image inputs plus a text prompt, returning text captions, OCR, and answers...

🖼️ → 📝 • image-to-text • ocr • image-captioning • 42.1K runs

🤖 Model 🖼️ → 📝

muqtadar08/image_to_text

Converts images into text descriptions or captions.

🖼️ → 📝 • image-to-text • 43 runs

🤖 Model 🖼️ → 📝

pengdaqian2020/image-tagger

Analyzes images and generates descriptive tags with confidence scores. Takes an image as input and returns an array of t...

🖼️ → 📝 • image-to-text • image-analysis • 42.5M runs

🤖 Model 🖼️ → 📝

deepseek-ai/janus-pro-1b

Analyzes images and answers questions about them using a unified autoregressive framework for multimodal understanding....

🖼️ → 📝 • image-to-text • visual-understanding • multimodal • 6.7K runs

🤖 Model 🖼️

sljeff/dots.ocr

Extract structured document layout and text from an image input and return a single JSON output. Parse page elements wit...

🖼️ • ocr • document-to-json • image-object-detection • 4.8K runs

🤖 Model 🖼️ → 📝

zsxkib/idefics3

Answers questions about images and generates detailed captions based on visual content and text prompts. Processes both...

🖼️ → 📝 • image-to-text • visual-understanding • image-analysis • 2.7K runs

🤖 Model 🖼️ → 📝

methexis-inc/img2prompt

Generates text prompts from input images that can be used with Stable Diffusion to recreate similar-looking versions of...

🖼️ → 📝 • image-to-text • 2.7M runs

🤖 Model 🖼️ → 📝

lucataco/smolvlm-instruct

Analyzes images and generates text responses based on visual content and text prompts. Accepts arbitrary sequences of im...

🖼️ → 📝 • image-to-text • visual-understanding • document-understanding • 8.3K runs