meta/meta-llama-3-8b 🔢📝 → 📝
About
Base version of Llama 3, an 8 billion parameter language model from Meta.

Example Output
"
Story title: 3 llamas go for a walk
Summary: The 3 llamas crossed a bridge and something unexpected happened
Once upon a time
"Output
there were 3 llamas. They were very friendly and they decided to go for a walk.
They crossed a bridge, but something unexpected happened.
The first llama was walking on the left side of the bridge, when suddenly he saw a snake.
He got scared and ran away as fast as he could.
The second llama was walking in the middle of the bridge, when suddenly he saw a dog.
He got scared and ran away as fast as he could.
The third llama was walking on the right side of the bridge, when suddenly he saw a cat.
He got scared and ran away as fast as he could.
All 3 llamas ran away from the bridge and never came back. The end.
Performance Metrics
All Input Parameters
{ "top_p": 0.9, "prompt": "Story title: 3 llamas go for a walk\nSummary: The 3 llamas crossed a bridge and something unexpected happened\n\nOnce upon a time", "max_tokens": 512, "min_tokens": 0, "temperature": 0.6, "prompt_template": "{prompt}", "presence_penalty": 1.15, "frequency_penalty": 0.2 }
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- Prompt
- max_tokens
- The maximum number of tokens the model should generate as output.
- min_tokens
- The minimum number of tokens the model should generate as output.
- temperature
- The value used to modulate the next token probabilities.
- prompt_template
- Prompt template. The string `{prompt}` will be substituted for the input prompt. If you want to generate dialog output, use this template as a starting point and construct the prompt string manually, leaving `prompt_template={prompt}`.
- presence_penalty
- Presence penalty
- frequency_penalty
- Frequency penalty
Output Schema
Output
Example Execution Logs
INFO 04-18 17:21:53 async_llm_engine.py:508] Received request 4020a08dfa1e411785b84ec42c00c6c7: prompt: 'Story title: 3 llamas go for a walk\nSummary: The 3 llamas crossed a bridge and something unexpected happened\n\nOnce upon a time', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=1.15, frequency_penalty=0.2, repetition_penalty=1.0, temperature=0.6, top_p=0.9, top_k=50, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|end_of_text|>', '<|eot_id|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=512, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: None, lora_request: None. stdoutINFO 04-18 17:21:54 metrics.py:218] Avg prompt throughput: 6.2 tokens/s, Avg generation throughput: 18.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% stdoutINFO 04-18 17:21:55 async_llm_engine.py:120] Finished request 4020a08dfa1e411785b84ec42c00c6c7. stdoutTokens/second: 79.92277125263657 stdout
Version Details
- Version ID
9a9e68fc8695f5847ce944a5cecf9967fd7c64d0fb8c8af1d5bdcc71f03c5e47
- Version Created
- April 17, 2024