meta/meta-llama-3-8b 🔢📝 → 📝

⭐ Official ▶️ 51.3M runs 📅 Apr 2024 ⚙️ Cog 0.9.4 🔗 GitHub ⚖️ License

code-generation document-summarization question-answering text-generation text-translation

About

Base version of Llama 3, an 8 billion parameter language model from Meta.

Example Output

Prompt:

Story title: 3 llamas go for a walk
Summary: The 3 llamas crossed a bridge and something unexpected happened

Once upon a time

Output

there were 3 llamas. They were very friendly and they decided to go for a walk.
They crossed a bridge, but something unexpected happened.

The first llama was walking on the left side of the bridge, when suddenly he saw a snake.
He got scared and ran away as fast as he could.

The second llama was walking in the middle of the bridge, when suddenly he saw a dog.
He got scared and ran away as fast as he could.

The third llama was walking on the right side of the bridge, when suddenly he saw a cat.
He got scared and ran away as fast as he could.

All 3 llamas ran away from the bridge and never came back. The end.

Performance Metrics

1.87s Prediction Time

1.90s Total Time

All Input Parameters

{
  "top_p": 0.9,
  "prompt": "Story title: 3 llamas go for a walk\nSummary: The 3 llamas crossed a bridge and something unexpected happened\n\nOnce upon a time",
  "max_tokens": 512,
  "min_tokens": 0,
  "temperature": 0.6,
  "prompt_template": "{prompt}",
  "presence_penalty": 1.15,
  "frequency_penalty": 0.2
}

Input Parameters

top_k Type: integerDefault: 50: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault:: Prompt
max_tokens Type: integerDefault: 512: The maximum number of tokens the model should generate as output.
min_tokens Type: integerDefault: 0: The minimum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.6: The value used to modulate the next token probabilities.
prompt_template Type: stringDefault: {prompt}: Prompt template. The string `{prompt}` will be substituted for the input prompt. If you want to generate dialog output, use this template as a starting point and construct the prompt string manually, leaving `prompt_template={prompt}`.
presence_penalty Type: numberDefault: 1.15: Presence penalty
frequency_penalty Type: numberDefault: 0.2: Frequency penalty

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

INFO 04-18 17:21:53 async_llm_engine.py:508] Received request 4020a08dfa1e411785b84ec42c00c6c7: prompt: 'Story title: 3 llamas go for a walk\nSummary: The 3 llamas crossed a bridge and something unexpected happened\n\nOnce upon a time', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=1.15, frequency_penalty=0.2, repetition_penalty=1.0, temperature=0.6, top_p=0.9, top_k=50, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|end_of_text|>', '<|eot_id|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None.
 stdoutINFO 04-18 17:21:54 metrics.py:218] Avg prompt throughput: 6.2 tokens/s, Avg generation throughput: 18.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
 stdoutINFO 04-18 17:21:55 async_llm_engine.py:120] Finished request 4020a08dfa1e411785b84ec42c00c6c7.
 stdoutTokens/second: 79.92277125263657
 stdout

Version Details

Version ID: 9a9e68fc8695f5847ce944a5cecf9967fd7c64d0fb8c8af1d5bdcc71f03c5e47
Version Created: April 17, 2024

Run on Replicate →