sljeff/dots.ocr 🖼️🔢📝 → 📝

▶️ 4.4K runs 📅 Aug 2025 ⚙️ Cog 0.16.2 🔗 GitHub ⚖️ License

document-layout document-to-json image-to-text ocr

About

https://github.com/sljeff/dots-ocr-client

Example Output

Prompt:

Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.

Bbox format: [x1, y1, x2, y2]
Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].
Text Extraction & Formatting Rules:
- Picture: For the 'Picture' category, the text field should be omitted.
- Formula: Format its text as LaTeX.
- Table: Format its text as HTML.
- All Others (Text, Title, etc.): Format their text as Markdown.
Constraints:
- The output text must be the original text from the image, with no translation.
- All layout elements must be sorted according to human reading order.
Final Output: The entire output must be a single JSON object.

Output

[{"bbox": [1355, 199, 2103, 262], "category": "Page-header", "text": "A PREPRINT - SEPTEMBER 5, 2022"}, {"bbox": [698, 455, 1846, 1010], "category": "Picture"}, {"bbox": [441, 1051, 2103, 1153], "category": "Caption", "text": "Figure 7: Algorithmic performance measured in ES/s (effective samples per second), for the eight highest energy KL coefficients $\ heta_k$, $k = 1, \dots, 8$, for both RWMH (blue) and MLDA (red)."}, {"bbox": [450, 1208, 1251, 1782], "category": "Picture"}, {"bbox": [1292, 1208, 2088, 1782], "category": "Picture"}, {"bbox": [526, 1821, 2021, 1878], "category": "Caption", "text": "Figure 8: The true (blue) and measured (red) densities of prey (left) and predators (right)."}, {"bbox": [441, 1970, 2103, 2073], "category": "Text", "text": "and perturbing the calculated values $N(t^)$ and $P(t^)$ with independent Gaussian noise $\epsilon \sim N(0, 1)$ (Fig. 8). Our aim is to predict the mean density of predators $\mathbb{E}(P)$ over the same period."}, {"bbox": [441, 2086, 2103, 2364], "category": "Text", "text": "The solutions of the ODE system in Eq. Eq. (36) can be approximated by a suitable numerical integration scheme. We use an explicit, adaptive Runge-Kutta method of order 5(4) [46]. For the finest level $l = 2$, we integrate over the entire time domain $T_2 = [0, 12]$ and use the entire dataset to compute the likelihood function, while for the coarse levels, we stop integration early, so that $T_1 = [0, 8]$ and $T_0 = [0, 4]$, and use only the corresponding subsets of the data to compute the likelihood functions."}, {"bbox": [441, 2381, 2103, 2528], "category": "Text", "text": "We assume that we possess some prior knowledge about the parameters, and use informed pri-
ors $N_0 \sim N(10.8, 1)$, $P_0 \sim N(5.3, 1)$, $a \sim N(2.5, 0.5)$, $b \sim \ ext{Inv-Gamma}(1.0, 0.5)$, $c \sim \ ext{Inv-Gamma}(1.0, 0.5)$ and $d \sim N(1.2, 0.3)$."}, {"bbox": [441, 2542, 2103, 2871], "category": "Text", "text": "To demonstrate the multilevel variance reduction feature, we ran the MLDA sampler with randomi-
sation of the subchain length as described in Section 2.3 and then compared the (multilevel) MLDA
estimator in Eq. Eq. (21), which uses both the coarse and fine samples, with a standard MCMC es-
timator based only on the samples produced by MLDA on the fine level. In both cases, we used the
three-level model hierarchy as described above and employed the Differential Evolution Markov
Chain (DE-MC_Z) proposal [48] on the coarsest level. The coarsest level proposal kernel was au-
tomatically tuned during burn-in to achieve an acceptance rate between 0.2 and 0.5. The subchain"}]

Performance Metrics

6.99s Prediction Time

6.99s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/NWlQUUIAB0pn2n7niyz70GvU7lWZzadscMX9UNSCGnDv7IZI/docling-test-2_page-0020.jpg",
  "top_p": 1,
  "prompt": "Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.\n\n1. Bbox format: [x1, y1, x2, y2]\n\n2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].\n\n3. Text Extraction & Formatting Rules:\n    - Picture: For the 'Picture' category, the text field should be omitted.\n    - Formula: Format its text as LaTeX.\n    - Table: Format its text as HTML.\n    - All Others (Text, Title, etc.): Format their text as Markdown.\n\n4. Constraints:\n    - The output text must be the original text from the image, with no translation.\n    - All layout elements must be sorted according to human reading order.\n\n5. Final Output: The entire output must be a single JSON object.",
  "max_tokens": 16384,
  "temperature": 0.1
}

Input Parameters

image (required) Type: string: Input image for OCR
top_p Type: numberDefault: 1Range: 0 - 1: Top-p sampling parameter
prompt Type: stringDefault: Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox. 1. Bbox format: [x1, y1, x2, y2] 2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title']. 3. Text Extraction & Formatting Rules: - Picture: For the 'Picture' category, the text field should be omitted. - Formula: Format its text as LaTeX. - Table: Format its text as HTML. - All Others (Text, Title, etc.): Format their text as Markdown. 4. Constraints: - The output text must be the original text from the image, with no translation. - All layout elements must be sorted according to human reading order. 5. Final Output: The entire output must be a single JSON object.: Prompt to guide the extraction
max_tokens Type: integerDefault: 16384Range: 1 - 32768: Maximum number of tokens to generate
temperature Type: numberDefault: 0.1Range: 0 - 2: Temperature for sampling (lower = more deterministic)

Output Schema

Output

Type: string

Version Details

Version ID: 214a4fc47a5e8254ae83362a34271feeb53c5e61d9bc8aadcf96a5d8717be4d6
Version Created: August 15, 2025

Run on Replicate →