sljeff/dots.ocr 🖼️🔢📝 → 📝
About
https://github.com/sljeff/dots-ocr-client

Example Output
Prompt:
"
Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.
Bbox format: [x1, y1, x2, y2]
Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].
Text Extraction & Formatting Rules:
- Picture: For the 'Picture' category, the text field should be omitted.
- Formula: Format its text as LaTeX.
- Table: Format its text as HTML.
- All Others (Text, Title, etc.): Format their text as Markdown.
Constraints:
- The output text must be the original text from the image, with no translation.
- All layout elements must be sorted according to human reading order.
Final Output: The entire output must be a single JSON object.
Output
[{"bbox": [1355, 199, 2103, 262], "category": "Page-header", "text": "A PREPRINT - SEPTEMBER 5, 2022"}, {"bbox": [698, 455, 1846, 1010], "category": "Picture"}, {"bbox": [441, 1051, 2103, 1153], "category": "Caption", "text": "Figure 7: Algorithmic performance measured in ES/s (effective samples per second), for the eight highest energy KL coefficients $\ heta_k$, $k = 1, \dots, 8$, for both RWMH (blue) and MLDA (red)."}, {"bbox": [450, 1208, 1251, 1782], "category": "Picture"}, {"bbox": [1292, 1208, 2088, 1782], "category": "Picture"}, {"bbox": [526, 1821, 2021, 1878], "category": "Caption", "text": "Figure 8: The true (blue) and measured (red) densities of prey (left) and predators (right)."}, {"bbox": [441, 1970, 2103, 2073], "category": "Text", "text": "and perturbing the calculated values $N(t^)$ and $P(t^)$ with independent Gaussian noise $\epsilon \sim N(0, 1)$ (Fig. 8). Our aim is to predict the mean density of predators $\mathbb{E}(P)$ over the same period."}, {"bbox": [441, 2086, 2103, 2364], "category": "Text", "text": "The solutions of the ODE system in Eq. Eq. (36) can be approximated by a suitable numerical integration scheme. We use an explicit, adaptive Runge-Kutta method of order 5(4) [46]. For the finest level $l = 2$, we integrate over the entire time domain $T_2 = [0, 12]$ and use the entire dataset to compute the likelihood function, while for the coarse levels, we stop integration early, so that $T_1 = [0, 8]$ and $T_0 = [0, 4]$, and use only the corresponding subsets of the data to compute the likelihood functions."}, {"bbox": [441, 2381, 2103, 2528], "category": "Text", "text": "We assume that we possess some prior knowledge about the parameters, and use informed pri-
ors $N_0 \sim N(10.8, 1)$, $P_0 \sim N(5.3, 1)$, $a \sim N(2.5, 0.5)$, $b \sim \ ext{Inv-Gamma}(1.0, 0.5)$, $c \sim \ ext{Inv-Gamma}(1.0, 0.5)$ and $d \sim N(1.2, 0.3)$."}, {"bbox": [441, 2542, 2103, 2871], "category": "Text", "text": "To demonstrate the multilevel variance reduction feature, we ran the MLDA sampler with randomi-
sation of the subchain length as described in Section 2.3 and then compared the (multilevel) MLDA
estimator in Eq. Eq. (21), which uses both the coarse and fine samples, with a standard MCMC es-
timator based only on the samples produced by MLDA on the fine level. In both cases, we used the
three-level model hierarchy as described above and employed the Differential Evolution Markov
Chain (DE-MCZ) proposal [48] on the coarsest level. The coarsest level proposal kernel was au-
tomatically tuned during burn-in to achieve an acceptance rate between 0.2 and 0.5. The subchain"}]
ors $N_0 \sim N(10.8, 1)$, $P_0 \sim N(5.3, 1)$, $a \sim N(2.5, 0.5)$, $b \sim \ ext{Inv-Gamma}(1.0, 0.5)$, $c \sim \ ext{Inv-Gamma}(1.0, 0.5)$ and $d \sim N(1.2, 0.3)$."}, {"bbox": [441, 2542, 2103, 2871], "category": "Text", "text": "To demonstrate the multilevel variance reduction feature, we ran the MLDA sampler with randomi-
sation of the subchain length as described in Section 2.3 and then compared the (multilevel) MLDA
estimator in Eq. Eq. (21), which uses both the coarse and fine samples, with a standard MCMC es-
timator based only on the samples produced by MLDA on the fine level. In both cases, we used the
three-level model hierarchy as described above and employed the Differential Evolution Markov
Chain (DE-MCZ) proposal [48] on the coarsest level. The coarsest level proposal kernel was au-
tomatically tuned during burn-in to achieve an acceptance rate between 0.2 and 0.5. The subchain"}]
Performance Metrics
6.99s
Prediction Time
6.99s
Total Time
All Input Parameters
{ "image": "https://replicate.delivery/pbxt/NWlQUUIAB0pn2n7niyz70GvU7lWZzadscMX9UNSCGnDv7IZI/docling-test-2_page-0020.jpg", "top_p": 1, "prompt": "Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.\n\n1. Bbox format: [x1, y1, x2, y2]\n\n2. Layout Categories: The possible categories are ['Caption', 'Footnote', 'Formula', 'List-item', 'Page-footer', 'Page-header', 'Picture', 'Section-header', 'Table', 'Text', 'Title'].\n\n3. Text Extraction & Formatting Rules:\n - Picture: For the 'Picture' category, the text field should be omitted.\n - Formula: Format its text as LaTeX.\n - Table: Format its text as HTML.\n - All Others (Text, Title, etc.): Format their text as Markdown.\n\n4. Constraints:\n - The output text must be the original text from the image, with no translation.\n - All layout elements must be sorted according to human reading order.\n\n5. Final Output: The entire output must be a single JSON object.", "max_tokens": 16384, "temperature": 0.1 }
Input Parameters
- image (required)
- Input image for OCR
- top_p
- Top-p sampling parameter
- prompt
- Prompt to guide the extraction
- max_tokens
- Maximum number of tokens to generate
- temperature
- Temperature for sampling (lower = more deterministic)
Output Schema
Output
Version Details
- Version ID
214a4fc47a5e8254ae83362a34271feeb53c5e61d9bc8aadcf96a5d8717be4d6
- Version Created
- August 15, 2025