sljeff/dots.ocr 🖼️🔢📝 → 📝

▶️ 4.4K runs 📅 Aug 2025 ⚙️ Cog 0.16.2 🔗 GitHub ⚖️ License
document-layout document-to-json image-to-text ocr

About

https://github.com/sljeff/dots-ocr-client

Example Output

Prompt:

"

Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.

  1. Bbox format: [x1, y1, x2, y2]

  2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].

  3. Text Extraction & Formatting Rules:

    • Picture: For the 'Picture' category, the text field should be omitted.
    • Formula: Format its text as LaTeX.
    • Table: Format its text as HTML.
    • All Others (Text, Title, etc.): Format their text as Markdown.
  4. Constraints:

    • The output text must be the original text from the image, with no translation.
    • All layout elements must be sorted according to human reading order.
  5. Final Output: The entire output must be a single JSON object.

"

Output

[{"bbox": [1355, 199, 2103, 262], "category": "Page-header", "text": "A PREPRINT - SEPTEMBER 5, 2022"}, {"bbox": [698, 455, 1846, 1010], "category": "Picture"}, {"bbox": [441, 1051, 2103, 1153], "category": "Caption", "text": "Figure 7: Algorithmic performance measured in ES/s (effective samples per second), for the eight highest energy KL coefficients $\ heta_k$, $k = 1, \dots, 8$, for both RWMH (blue) and MLDA (red)."}, {"bbox": [450, 1208, 1251, 1782], "category": "Picture"}, {"bbox": [1292, 1208, 2088, 1782], "category": "Picture"}, {"bbox": [526, 1821, 2021, 1878], "category": "Caption", "text": "Figure 8: The true (blue) and measured (red) densities of prey (left) and predators (right)."}, {"bbox": [441, 1970, 2103, 2073], "category": "Text", "text": "and perturbing the calculated values $N(t^)$ and $P(t^)$ with independent Gaussian noise $\epsilon \sim N(0, 1)$ (Fig. 8). Our aim is to predict the mean density of predators $\mathbb{E}(P)$ over the same period."}, {"bbox": [441, 2086, 2103, 2364], "category": "Text", "text": "The solutions of the ODE system in Eq. Eq. (36) can be approximated by a suitable numerical integration scheme. We use an explicit, adaptive Runge-Kutta method of order 5(4) [46]. For the finest level $l = 2$, we integrate over the entire time domain $T_2 = [0, 12]$ and use the entire dataset to compute the likelihood function, while for the coarse levels, we stop integration early, so that $T_1 = [0, 8]$ and $T_0 = [0, 4]$, and use only the corresponding subsets of the data to compute the likelihood functions."}, {"bbox": [441, 2381, 2103, 2528], "category": "Text", "text": "We assume that we possess some prior knowledge about the parameters, and use informed pri-
ors $N_0 \sim N(10.8, 1)$, $P_0 \sim N(5.3, 1)$, $a \sim N(2.5, 0.5)$, $b \sim \ ext{Inv-Gamma}(1.0, 0.5)$, $c \sim \ ext{Inv-Gamma}(1.0, 0.5)$ and $d \sim N(1.2, 0.3)$."}, {"bbox": [441, 2542, 2103, 2871], "category": "Text", "text": "To demonstrate the multilevel variance reduction feature, we ran the MLDA sampler with randomi-
sation of the subchain length as described in Section 2.3 and then compared the (multilevel) MLDA
estimator in Eq. Eq. (21), which uses both the coarse and fine samples, with a standard MCMC es-
timator based only on the samples produced by MLDA on the fine level. In both cases, we used the
three-level model hierarchy as described above and employed the Differential Evolution Markov
Chain (DE-MCZ) proposal [48] on the coarsest level. The coarsest level proposal kernel was au-
tomatically tuned during burn-in to achieve an acceptance rate between 0.2 and 0.5. The subchain"}]

Performance Metrics

6.99s Prediction Time
6.99s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/NWlQUUIAB0pn2n7niyz70GvU7lWZzadscMX9UNSCGnDv7IZI/docling-test-2_page-0020.jpg",
  "top_p": 1,
  "prompt": "Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.\n\n1. Bbox format: [x1, y1, x2, y2]\n\n2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].\n\n3. Text Extraction & Formatting Rules:\n    - Picture: For the 'Picture' category, the text field should be omitted.\n    - Formula: Format its text as LaTeX.\n    - Table: Format its text as HTML.\n    - All Others (Text, Title, etc.): Format their text as Markdown.\n\n4. Constraints:\n    - The output text must be the original text from the image, with no translation.\n    - All layout elements must be sorted according to human reading order.\n\n5. Final Output: The entire output must be a single JSON object.",
  "max_tokens": 16384,
  "temperature": 0.1
}
Input Parameters
image (required) Type: string
Input image for OCR
top_p Type: numberDefault: 1Range: 0 - 1
Top-p sampling parameter
prompt Type: stringDefault: Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox. 1. Bbox format: [x1, y1, x2, y2] 2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title']. 3. Text Extraction & Formatting Rules: - Picture: For the 'Picture' category, the text field should be omitted. - Formula: Format its text as LaTeX. - Table: Format its text as HTML. - All Others (Text, Title, etc.): Format their text as Markdown. 4. Constraints: - The output text must be the original text from the image, with no translation. - All layout elements must be sorted according to human reading order. 5. Final Output: The entire output must be a single JSON object.
Prompt to guide the extraction
max_tokens Type: integerDefault: 16384Range: 1 - 32768
Maximum number of tokens to generate
temperature Type: numberDefault: 0.1Range: 0 - 2
Temperature for sampling (lower = more deterministic)
Output Schema

Output

Type: string

Version Details
Version ID
214a4fc47a5e8254ae83362a34271feeb53c5e61d9bc8aadcf96a5d8717be4d6
Version Created
August 15, 2025
Run on Replicate →