bytedance/dolphin πΌοΈβ β β
About
Document Image Parsing via Heterogeneous Anchor Prompting

Example Output
Output
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron; Thibaut Lavril; Gautier Izacard; Xavier Martinet Marie-Anne Lachaux, Timothee Lacroix, Baptiste RoziΓ¨re, Naman Goyal Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin Edouard Grave; Guillaume Lample $^β$
Meta AI
Abstract
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (17SB) on most benchmarks, and LLaMA65B is competitive with the best models. Chinchilla-70B and PaLM-540B. We release all our models to the research community $^1$ .
1 Introduction
Large Languages Models (LLMs) trained on massive corpora of texts have shown their ability to perform new tasks from textual instructions or from a few examples ( Brown et al. , 2020 ) . These few-shot properties first appeared when scaling models to a sufficient size ( Kaplan et al. , 2020 ) , resulting in a line of work that focuses on further scaling these models ( Chowdhery et al. , 2022 ; Rae et al. , 2021 ) . These efforts are based on the assumption that more parameters will lead to better performance. However, recent work from Hoffmann et al. ( 2022 ) shows that, for a given compute budget, the best performances are not achieved by the largest models, but by smaller models trained on more data.
The objective of the scaling laws from Hoffmann et al. ( 2022 ) is to determine how to best scale the dataset and model sizes for a particular training compute budget. However, this objective disregards the inference budget, which becomes critical when serving a language model at scale. In this context, given a target level of performance, the preferred model is not the fastest to train but the fastest at inference, and although it may be cheaper to train a large model to reach a certain level of
performance, a smaller one trained longer will ultimately be cheaper at inference. For instance, although Hoffmann et al. ( 2022 ) recommends training a 10B model on 200B tokens, we find that the performance of a 7B model continues to improve even after 1T tokens.
The focus of this work is to train a series of language models that achieve the best possible performance at various inference budgets, by training on more tokens than what is typically used. The resulting models, called LLaMA , ranges from 7B to 65B parameters with competitive performance compared to the best existing LLMs. For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10 $ imes$ smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU. At the higher-end of the scale, our 65B-parameter model is also competitive with the best large language models such as Chinchilla or PaLM-540B.
Unlike Chinchilla, PaLM, or GPT-3, we only use publicly available data, making our work compatible with open-sourcing, while most existing models rely on data which is either not publicly available or undocumented (e.g. β Books β 2TB" or β Social media conversations"). There exist some exceptions, notably OPT ( Zhang et al. , 2022 ) , GPT-NeoX ( Black et al. , 2022 ) , BLOOM ( Scao et al. , 2022 ) and GLM ( Zeng et al. , 2022 ) , but none that are competitive with PaLM-62B or Chinchilla.
In the rest of this paper, we present an overview of the modifications we made to the transformer architecture ( Vaswani et al. , 2017 ) , as well as our training method. We then report the performance of our models and compare with others LLMs on a set of standard benchmarks. Finally, we expose some of the biases and toxicity encoded in our models, using some of the most recent benchmarks from the responsible AI community.
- Equal contribution. Correspondence: (htouvron, thibautlav,gizacard,egrave,glample)@meta.com
https://github.com/facebookresearch/llam
arXiv:2302.1397lv1 [cs.CL] 27 Feb 2023
Performance Metrics
All Input Parameters
{ "file": "https://replicate.delivery/pbxt/NDm0JLRLZiYuq9mDaTixxeV2eDy6GyFtQgWzQvC0a2h1npA3/page_1.pdf", "output_format": "markdown_content" }
Input Parameters
- file (required)
- PDF or image file to parse
- output_format
- Output format
Output Schema
Output
Example Execution Logs
2025-06-20 21:45:00.684 | INFO | predict:_process_document:86 - Document has 1 page(s) 2025-06-20 21:45:00.871 | INFO | predict:_process_document:107 - Processing page 1/1 Legacy behavior is being used. The current behavior will be deprecated in version 5.0.0. In the new behavior, if both images and text are provided, the default value of `add_special_tokens` will be changed to `False` when calling the tokenizer if `add_special_tokens` is unset. To test the new behavior, set `legacy=False`as a processor call argument.
Version Details
- Version ID
19f1ad93970c2bf21442a842d01d97fb04a94a69d2b36dee43531a9cbae07e85
- Version Created
- June 20, 2025