ocr AI Models - Cloudernative

abiruyt/text-extract-ocr

Extract text from images using optical character recognition (OCR). Takes an image as input and returns the extracted te...

ocr • 90.4M runs

🤖 Model

bytedance/dolphin

Convert PDFs or document images into Markdown or structured JSON with layout-aware OCR and element parsing. Perform page...

pdf-to-markdown • document-to-json • ocr • 968 runs

🤖 Model 🖼️ → 📝

lucataco/qwen2-vl-7b-instruct

Analyze images and videos with text prompts to generate detailed descriptions, answer questions, and extract information...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 547.2K runs

🤖 Model 🖼️ → 📝

nomagick/qwen-vl-chat

Generates text responses based on text prompts and images with ChatML prompt interface and streaming support. Accepts up...

🖼️ → 📝 • text-generation • image-to-text • image-analysis • 1.1K runs

🤖 Model 🖼️ → 📝

ibm-granite/granite-vision-3.2-2b

Extract content and answer questions from images of documents. Takes an image plus a text prompt or question and outputs...

🖼️ → 📝 • image-to-text • ocr • 118.0K runs

🤖 Model 🖼️ → 📝

cuuupid/glm-4v-9b

Generates text responses to questions about images with multimodal understanding capabilities. Takes an image and text p...

🖼️ → 📝 • image-to-text • ocr • text-generation • 93.8K runs

🤖 Model 🖼️ → 📝

lucataco/moondream1

Answer questions about images. Takes an image and a text prompt, and returns a text answer, enabling visual question ans...

🖼️ → 📝 • image-to-text • ocr • visual-question-answering • 11.5K runs

🤖 Model 🖼️ → 📝

deepseek-ai/deepseek-vl2

Analyzes images and answers questions about visual content using a Mixture-of-Experts architecture. Takes an image and t...

🖼️ → 📝 • image-to-text • ocr • text-generation • 98.6K runs

🤖 Model 🖼️ → 📝

sai88uk/minicpm-v-45-v9

Analyzes single images, multiple images, and high-FPS videos to answer questions about their content. Supports controlla...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 14.3K runs

🤖 Model 🖼️ → 📝

deepseek-ai/janus-pro-7b

Answers questions about images through multimodal understanding. Takes an image and a text question as input and generat...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 13.9K runs

🤖 Model 🖼️ → 📝

lucataco/llama-3-vision-alpha

Analyzes images and generates text descriptions or answers questions about visual content. Uses a projection module trai...

🖼️ → 📝 • image-to-text • text-generation • visual-understanding • 6.8K runs

🤖 Model 🖼️ → 📝

lucataco/idefics-8b

Accepts arbitrary sequences of image and text inputs to produce text outputs for multimodal tasks. Answers questions abo...

🖼️ → 📝 • image-to-text • visual-understanding • ocr • 1.2K runs

🤖 Model

cudanexus/nougat

Extract text and structure from academic PDFs using OCR. Accepts a PDF file and outputs a machine-readable transcription...

ocr • pdf-to-markdown • 242 runs

🤖 Model

cuuupid/marker

Convert scanned or digital documents to Markdown. Accepts PDF, EPUB, MOBI, XPS, and FB2 inputs and outputs Markdown plus...

pdf-to-markdown • ocr • 2.8K runs

🤖 Model

cudanexus/ocr-surya

Extract text from images and PDFs in 90+ languages. Accept an image or multi-page PDF, a selected language list, and a p...

ocr • text-detection • 6.5K runs

🤖 Model

awilliamson10/meta-nougat

Convert academic PDF documents into Markdown text. Accept a PDF and return extracted text with document structure (headi...

pdf-to-markdown • ocr • 4.8K runs

🤖 Model

mickeybeurskens/latex-ocr

Extract LaTeX code from images of mathematical equations and expressions. Takes a single image as input and returns the...

ocr • latex-ocr • 871 runs

🤖 Model

willywongi/donut

Extract structured data from receipt images into JSON. Input a receipt image; output structured key–value fields and lin...

document-to-json • ocr • receipt-parsing • 2.2K runs

🤖 Model 🖼️ → 📝

ibm-granite/granite-vision-3.3-2b

Analyze documents and images from one or more image inputs plus a text prompt, returning text captions, OCR, and answers...

🖼️ → 📝 • image-to-text • ocr • image-captioning • 42.1K runs

🤖 Model 🖼️ → 📝

deepseek-ai/janus-pro-1b

Analyzes images and answers questions about them using a unified autoregressive framework for multimodal understanding....

🖼️ → 📝 • image-to-text • visual-understanding • multimodal • 6.7K runs