cuuupid/marker 🔢❓🖼️✓ → ❓

▶️ 2.8K runs 📅 Dec 2023 ⚙️ Cog 0.8.3 🔗 GitHub ⚖️ License
document-to-json ocr pdf-to-markdown

About

Convert scanned or electronic documents to markdown, very very very fast

Example Output

Output

{"markdown":"https://replicate.delivery/pbxt/8UKRTTBKhe3uTi3sEFdYdfK9YWAvoDC50dU2NibKOSVKVJASA/out.md","metadata":"{"language": "English", "filetype": "pdf", "toc": [], "pages": 4, "ocr_stats": {"ocr_pages": 0, "ocr_failed": 0, "ocr_success": 0}, "block_stats": {"header_footer": 0, "code": 0, "table": 0, "equations": {"successful_ocr": 0, "unsuccessful_ocr": 0, "equations": 0}}, "postprocess_stats": {"edit": {}}}"}

Performance Metrics

1.78s Prediction Time
1.82s Total Time
All Input Parameters
{
  "dpi": 400,
  "lang": "English",
  "document": "https://replicate.delivery/pbxt/K0onIKM1Wn5xTzan7ua67mqePVrRf6feas4sfTjbbAROkrcL/The%20Tell-Tale%20Heart.pdf",
  "enable_editor": false,
  "parallel_factor": 10
}
Input Parameters
dpi Type: integerDefault: 400
The DPI to use for OCR.
lang Default: English
Provide the language to use for OCR.
document Type: string
Provide your input file (PDF, EPUB, MOBI, XPS, FB2).
max_pages Type: integer
Provide the maximum number of pages to parse.
enable_editor Type: booleanDefault: false
Enable the editor model.
parallel_factor Type: integerDefault: 1
Provide the parallel factor to use for OCR.
Output Schema

Output

Example Execution Logs
Extracting fromExtracting from 11 blocks 9 blocks
Extracted block 0 with 15 lines
Extracted block 0 with Extracted block 1 with 1 lines
Extracted block 2 with 1 lines28Extracting from 17 blocks
Extracted block 0 with 28  lines
Extracting from 9Extracted block 1 blocks
Extracted block 0 with 15 with lines
Extracted block  lines3 lines
Extracted block Extracted block 1
1 Extracted block 3 with 2 withwith  2 115 lineslines
Extracted block 4 lines
Extracted block 2 with 1 lines
Extracted block 3  with 1with 2 with 1 lines
lines
Extracted block 5 Extracted block 2 with 1 withlinesExtracted block
Extracted block  12 lines
Extracted block 6 with 1lines
Extracted block 4 3 with3 with 5 lines
Extracted block  lines
Extracted block 7 4with 1 lineswith 6 lines
15
Extracted block 8 withExtracted block 5 with 8 1   lineslineslines
with 1 lines
Extracted block 6
Extracted 9 blocks
Extracted block 5 with 4 Extracted block 4 with with 1 lineslines
Extracted block
Extracted block 7 1 lines
Extracted block 5 with6  2 lines
Extracted block 6 withwith 1with 2 lines
Extracted block 8 with  11 lines
Extracted block 9 with  lines
Extracted block 7 with 8 lines lines
3 lines
Extracted block 10
Extracted block 7 withExtracted block 8 with with 1 lines
1 linesExtracted block
9Extracted block 11 6 lines with
Extracted block 8 with5  with 5 lines
lines
Extracted block 10 1 lines
Extracted 9Extracted block 12 with  blocks with 11 lines
Extracted block 13 with 5
lines
lines
Extracted block 14 withExtracted 11 blocks
1 lines
Extracted block 15 with 3 lines
Extracted block 16 with 1 lines
Extracted 17 blocks
Version Details
Version ID
4eb62a42c3e5b8695a796936e69afa2c004839aef15410f01492d59783baf752
Version Created
December 7, 2023
Run on Replicate →