ghostljj/deepseek-ocr ❓🖼️📝 → ❓

▶️ 86 runs 📅 Oct 2025 ⚙️ Cog 0.16.8

About

https://github.com/deepseek-ai/DeepSeek-OCR

Example Output

Output

{"text":"## Aspire OCR and Barcode Recognition High performance, royalty- free OCR and barcode recognition on Windows, Linux, Mac OS and Unix. Aspire OCR (optical character recognition) and barcode recognition SDK offers a high performance library for you to equip your Java applications (Java applets, web applications, Swing/JavaFX components, JEE enterprise applications), C#/VB.NET applications, and C/C++/Python applications with functionality of extracting text and barcode information from scanned documents. ## Convert Images To Searchable PDF With a few lines of code, you can convert various formats of images such as JPEG, PNG, and TIFF into searchable PDF.

PDF Output Formats	Remarks
PDF	Normal PDF
PDF/A	ISO 19005

## All Popular Barcode Formats All popular barcode formats are supported: EAN- 8, EAN- 13, UPC- A, UPC- E, ISBN- 10, ISBN- 13, Interleaved 2 of 5, Code 39, Code 128, PDF417, and QR Code.","image":"https://replicate.delivery/xezq/ehJftpJOS6tEhkvpojF4MN14Vtc7zWMvuQ5FNJgvMfnjEnJrA/result_with_boxes.jpg","markdown":"

Aspire OCR and Barcode Recognition

High performance, royalty- free OCR and barcode recognition on Windows, Linux, Mac OS and Unix.

Aspire OCR (optical character recognition) and barcode recognition SDK offers a high performance library for you to equip your Java applications (Java applets, web applications, Swing/JavaFX components, JEE enterprise applications), C#/VB.NET applications, and C/C++/Python applications with functionality of extracting text and barcode information from scanned documents.

Convert Images To Searchable PDF

With a few lines of code, you can convert various formats of images such as JPEG, PNG, and TIFF into searchable PDF.

PDF Output Formats	Remarks
PDF	Normal PDF
PDF/A	ISO 19005

All Popular Barcode Formats

All popular barcode formats are supported: EAN- 8, EAN- 13, UPC- A, UPC- E, ISBN- 10, ISBN- 13, Interleaved 2 of 5, Code 39, Code 128, PDF417, and QR Code.

","text_path":"https://replicate.delivery/xezq/cZ0Qhz9fjh1OBiWUqSTcwxfeJXyDpBeeHjTCOscz2iYPScmsC/text.txt","coordinates":[{"bbox":[130,72,624,97],"text":"## Aspire OCR and Barcode Recognition","type":"sub_title"},{"bbox":[130,107,855,144],"text":"High performance, royalty- free OCR and barcode recognition on Windows, Linux, Mac OS and Unix.","type":"text"},{"bbox":[130,153,855,245],"text":"Aspire OCR (optical character recognition) and barcode recognition SDK offers a high performance library for you to equip your Java applications (Java applets, web applications, Swing/JavaFX components, JEE enterprise applications), C#/VB.NET applications, and C/C++/Python applications with functionality of extracting text and barcode information from scanned documents.","type":"text"},{"bbox":[130,260,484,281],"text":"## Convert Images To Searchable PDF","type":"sub_title"},{"bbox":[130,290,867,327],"text":"With a few lines of code, you can convert various formats of images such as JPEG, PNG, and TIFF into searchable PDF.","type":"text"},{"bbox":[130,335,869,392],"text":"

PDF Output Formats	Remarks
PDF	Normal PDF
PDF/A	ISO 19005

","type":"table"},{"bbox":[131,405,422,425],"text":"## All Popular Barcode Formats","type":"sub_title"},{"bbox":[130,434,830,472],"text":"All popular barcode formats are supported: EAN- 8, EAN- 13, UPC- A, UPC- E, ISBN- 10, ISBN- 13, Interleaved 2 of 5, Code 39, Code 128, PDF417, and QR Code.","type":"text"},{"bbox":[131,508,400,560],"text":"","type":"image"},{"bbox":[576,512,768,655],"text":"","type":"image"}],"markdown_path":"https://replicate.delivery/xezq/srWXcDmjc9Y7Axv4fzePXWgehzxmQnTIQM9qlxnO8R2jEnJrA/result.md","extracted_images":[{"bbox":[131,508,400,560],"path":"https://replicate.delivery/xezq/oEeAuQcfrJqeQommzz182ubP1ioZ3sh4uSJDTcGPXVMiEnJrA/0.jpg","label":""},{"bbox":[576,512,768,655],"path":"https://replicate.delivery/xezq/FfosfDMe4CYhyohs5iEiphGGLqJgxPojKvwnYLLcfSSFJOTWB/1.jpg","label":""}]}

Performance Metrics

17.16s Prediction Time

151.81s Total Time

All Input Parameters

{
  "mode": "gundam",
  "image": "https://khang119966-deepseek-ocr-demo.hf.space/gradio_api/file=/tmp/gradio/d1d01f161fba31282f2e95057dc3f8d177257ed4932c9e01676dea1d8e3e3a59/doc_markdown.png",
  "task_type": "markdown",
  "custom_prompt": ""
}

Input Parameters

mode Default: gundam: Processing mode: tiny (512×512), small (640×640), base (1024×1024), large (1280×1280), or gundam (adaptive 640+1024)
image (required) Type: string: Input image URL or path for OCR
task_type Default: markdown: Task type: markdown (convert to markdown), ocr (standard OCR), free_ocr (OCR without layout), parse_figure (parse figures/diagrams), describe (detailed description), or locate (locate objects by reference)
custom_prompt Type: stringDefault:: Custom prompt (optional, overrides task_type if provided)

Output Schema

Example Execution Logs

First prediction call - loading model...
Loading model for the first time...
✓ Found local model at: /src/data/models/DeepSeek-OCR
✓ Using local model: /src/data/models/DeepSeek-OCR
Loading tokenizer from /src/data/models/DeepSeek-OCR...
✓ Tokenizer loaded successfully
Loading model from /src/data/models/DeepSeek-OCR...
Loading model with float32 (CUDA compatible)...
You are using a model of type deepseek_vl_v2 to instantiate a model of type DeepseekOCR. This is not supported for all configurations of models and can yield errors.
Some weights of DeepseekOCRForCausalLM were not initialized from the model checkpoint at /src/data/models/DeepSeek-OCR and are newly initialized: ['model.vision_model.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
✓ Model loaded with float32
Checking CUDA availability...
✓ Model loaded to GPU
✓ Model cached and ready for inference
Using task type: markdown with prompt: <image>
<|grounding|>Convert the document to markdown.
✓ Created output directory: results/20251101_040320_bf0b2b13
DEBUG: Current working directory: /src
DEBUG: Relative output directory: results/20251101_040320_bf0b2b13
DEBUG: Absolute output directory: /src/results/20251101_040320_bf0b2b13
Saving inference results to: results/20251101_040320_bf0b2b13
Processing image: https://khang119966-deepseek-ocr-demo.hf.space/gradio_api/file=/tmp/gradio/d1d01f161fba31282f2e95057dc3f8d177257ed4932c9e01676dea1d8e3e3a59/doc_markdown.png
Using device: cuda
Using mode: gundam (base_size=1024, image_size=640, crop_mode=True)
Downloading image from URL: https://khang119966-deepseek-ocr-demo.hf.space/gradio_api/file=/tmp/gradio/d1d01f161fba31282f2e95057dc3f8d177257ed4932c9e01676dea1d8e3e3a59/doc_markdown.png
✓ Image downloaded to: /tmp/deepseek_ocr___2jra5p/doc_markdown.png
/root/.pyenv/versions/3.12.11/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
=====================
BASE:  torch.Size([1, 256, 1280])
PATCHES:  torch.Size([6, 100, 1280])
=====================
The attention layers in this model are transitioning from computing the RoPE embeddings internally through `position_ids` (2D tensor with the indexes of the tokens), to using externally computed `position_embeddings` (Tuple of tensors, containing cos and sin). In v4.46 `position_ids` will be removed and `position_embeddings` will be mandatory.
<|ref|>sub_title<|/ref|><|det|>[[130, 72, 624, 97]]<|/det|>
## Aspire OCR and Barcode Recognition
<|ref|>text<|/ref|><|det|>[[130, 107, 855, 144]]<|/det|>
High performance, royalty- free OCR and barcode recognition on Windows, Linux, Mac OS and Unix.
<|ref|>text<|/ref|><|det|>[[130, 153, 855, 245]]<|/det|>
Aspire OCR (optical character recognition) and barcode recognition SDK offers a high performance library for you to equip your Java applications (Java applets, web applications, Swing/JavaFX components, JEE enterprise applications), C#/VB.NET applications, and C/C++/Python applications with functionality of extracting text and barcode information from scanned documents.
<|ref|>sub_title<|/ref|><|det|>[[130, 260, 484, 281]]<|/det|>
## Convert Images To Searchable PDF
<|ref|>text<|/ref|><|det|>[[130, 290, 867, 327]]<|/det|>
With a few lines of code, you can convert various formats of images such as JPEG, PNG, and TIFF into searchable PDF.
<|ref|>table<|/ref|><|det|>[[130, 335, 869, 392]]<|/det|>
<table><tr><td>PDF Output Formats</td><td>Remarks</td></tr><tr><td>PDF</td><td>Normal PDF</td></tr><tr><td>PDF/A</td><td>ISO 19005</td></tr></table>
<|ref|>sub_title<|/ref|><|det|>[[131, 405, 422, 425]]<|/det|>
## All Popular Barcode Formats
<|ref|>text<|/ref|><|det|>[[130, 434, 830, 472]]<|/det|>
All popular barcode formats are supported: EAN- 8, EAN- 13, UPC- A, UPC- E, ISBN- 10, ISBN- 13, Interleaved 2 of 5, Code 39, Code 128, PDF417, and QR Code.
<|ref|>image<|/ref|><|det|>[[131, 508, 400, 560]]<|/det|>
<|ref|>image<|/ref|><|det|>[[576, 512, 768, 655]]<|/det|>
===============save results:===============
image:   0%|          | 0/2 [00:00<?, ?it/s]
image: 100%|██████████| 2/2 [00:00<00:00, 40920.04it/s]
other:   0%|          | 0/8 [00:00<?, ?it/s]
other: 100%|██████████| 8/8 [00:00<00:00, 159025.74it/s]
✓ Parsed coordinates: 10 items
First coordinate example: {'type': 'sub_title', 'text': '## Aspire OCR and Barcode Recognition', 'bbox': [130, 72, 624, 97]}
✓ Text output saved to: results/20251101_040320_bf0b2b13/text.txt
✓ Extracted 10 coordinate(s)
✓ Markdown saved to: results/20251101_040320_bf0b2b13/result.md
✓ Image saved to: results/20251101_040320_bf0b2b13/result_with_boxes.jpg
✓ Image path added to result
✓ Extracted 2 image(s) from images directory
✓ Images directory saved to: results/20251101_040320_bf0b2b13/images
✓ Added 2 extracted image(s) to result
✓ All output files saved to: results/20251101_040320_bf0b2b13

Version Details

Version ID: 1b54870bb929efd83102612fe55ae98d6b80985177f316827499b82384648b20
Version Created: November 1, 2025

Run on Replicate →