bytedance/dolphin πŸ–ΌοΈβ“ β†’ ❓

▢️ 939 runs πŸ“… Jun 2025 βš™οΈ Cog 0.15.5 πŸ”— GitHub βš–οΈ License
document-to-json ocr pdf-to-markdown

About

Document Image Parsing via Heterogeneous Anchor Prompting

Example Output

Output

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron; Thibaut Lavril; Gautier Izacard; Xavier Martinet Marie-Anne Lachaux, Timothee Lacroix, Baptiste RoziΓ¨re, Naman Goyal Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin Edouard Grave; Guillaume Lample $^⋆$

Meta AI

Abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (17SB) on most benchmarks, and LLaMA65B is competitive with the best models. Chinchilla-70B and PaLM-540B. We release all our models to the research community $^1$ .

1 Introduction

Large Languages Models (LLMs) trained on massive corpora of texts have shown their ability to perform new tasks from textual instructions or from a few examples ( Brown et al. , 2020 ) . These few-shot properties first appeared when scaling models to a sufficient size ( Kaplan et al. , 2020 ) , resulting in a line of work that focuses on further scaling these models ( Chowdhery et al. , 2022 ; Rae et al. , 2021 ) . These efforts are based on the assumption that more parameters will lead to better performance. However, recent work from Hoffmann et al. ( 2022 ) shows that, for a given compute budget, the best performances are not achieved by the largest models, but by smaller models trained on more data.

The objective of the scaling laws from Hoffmann et al. ( 2022 ) is to determine how to best scale the dataset and model sizes for a particular training compute budget. However, this objective disregards the inference budget, which becomes critical when serving a language model at scale. In this context, given a target level of performance, the preferred model is not the fastest to train but the fastest at inference, and although it may be cheaper to train a large model to reach a certain level of

performance, a smaller one trained longer will ultimately be cheaper at inference. For instance, although Hoffmann et al. ( 2022 ) recommends training a 10B model on 200B tokens, we find that the performance of a 7B model continues to improve even after 1T tokens.

The focus of this work is to train a series of language models that achieve the best possible performance at various inference budgets, by training on more tokens than what is typically used. The resulting models, called LLaMA , ranges from 7B to 65B parameters with competitive performance compared to the best existing LLMs. For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10 $ imes$ smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU. At the higher-end of the scale, our 65B-parameter model is also competitive with the best large language models such as Chinchilla or PaLM-540B.

Unlike Chinchilla, PaLM, or GPT-3, we only use publicly available data, making our work compatible with open-sourcing, while most existing models rely on data which is either not publicly available or undocumented (e.g. β€œ Books – 2TB" or β€œ Social media conversations"). There exist some exceptions, notably OPT ( Zhang et al. , 2022 ) , GPT-NeoX ( Black et al. , 2022 ) , BLOOM ( Scao et al. , 2022 ) and GLM ( Zeng et al. , 2022 ) , but none that are competitive with PaLM-62B or Chinchilla.

In the rest of this paper, we present an overview of the modifications we made to the transformer architecture ( Vaswani et al. , 2017 ) , as well as our training method. We then report the performance of our models and compare with others LLMs on a set of standard benchmarks. Finally, we expose some of the biases and toxicity encoded in our models, using some of the most recent benchmarks from the responsible AI community.

  • Equal contribution. Correspondence: (htouvron, thibautlav,gizacard,egrave,glample)@meta.com

https://github.com/facebookresearch/llam

arXiv:2302.1397lv1 [cs.CL] 27 Feb 2023

Performance Metrics

8.31s Prediction Time
120.63s Total Time
All Input Parameters
{
  "file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf",
  "output_format": "markdown_content"
}
Input Parameters
file (required) Type: string
PDF or image file to parse
output_format Default: markdown_content
Output format
Output Schema

Output

Example Execution Logs
2025-06-20 21:45:00.684 | INFO     | predict:_process_document:86 - Document has 1 page(s)
2025-06-20 21:45:00.871 | INFO     | predict:_process_document:107 - Processing page 1/1
Legacy behavior is being used. The current behavior will be deprecated in version 5.0.0. In the new behavior, if both images and text are provided, the default value of `add_special_tokens` will be changed to `False` when calling the tokenizer if `add_special_tokens` is unset. To test the new behavior, set `legacy=False`as a processor call argument.
Version Details
Version ID
19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85
Version Created
June 20, 2025
Run on Replicate β†’