meta/meta-llama-3.1-405b-instruct 🔢📝 → 📝

⭐ Official ▶️ 7.1M runs 📅 Jul 2024 ⚙️ Cog 0.16.9 🔗 GitHub ⚖️ License

code-generation multilingual text-generation text-translation

About

Meta's flagship 405 billion parameter language model, fine-tuned for chat completions

Example Output

Prompt:

"Tina has one brother and one sister. How many sisters do Tina's siblings have?"

Output

Tina has one brother and one sister. From the brother's perspective, he has one sister, Tina, and also the other sister. So, Tina's brother has 2 sisters.

From the sister's perspective, she also has one sister, Tina.

So, Tina's siblings have a total of 2 sisters (from the brother's perspective) and 1 sister (from the sister's perspective).

Performance Metrics

2.83s Prediction Time

2.86s Total Time

All Input Parameters

{
  "top_p": 0.9,
  "prompt": "Tina has one brother and one sister. How many sisters do Tina's siblings have?",
  "max_tokens": 1024,
  "min_tokens": 0,
  "temperature": 0.6,
  "system_prompt": "You are a helpful assistant.",
  "presence_penalty": 0,
  "frequency_penalty": 0
}

Input Parameters

top_k Type: integerDefault: 50: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9: A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault:: Prompt
max_tokens Type: integerDefault: 512: The maximum number of tokens the model should generate as output.
min_tokens Type: integerDefault: 0: The minimum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.6: The value used to modulate the next token probabilities.
system_prompt Type: stringDefault: You are a helpful assistant.: System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
stop_sequences Type: stringDefault:: A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
prompt_template Type: stringDefault:: A template to format the prompt with. If not provided, the default prompt template will be used.
presence_penalty Type: numberDefault: 0: Presence penalty
frequency_penalty Type: numberDefault: 0: Frequency penalty

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

INFO 07-23 15:04:21 async_llm_engine.py:173] Added request 3cd670c9b87b469ea9721ddfbcc6b67a.
INFO 07-23 15:04:21 metrics.py:396] Avg prompt throughput: 7.8 tokens/s, Avg generation throughput: 17.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.
INFO 07-23 15:04:24 async_llm_engine.py:140] Finished request 3cd670c9b87b469ea9721ddfbcc6b67a.
Generation took 1721746720.08sFormatted prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

Tina has one brother and one sister. How many sisters do Tina's siblings have?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Version Details

Version ID: 4ff591d23f09abef843c126a3c526bffb037a4e854e0af5af133a4d0f4243181
Version Created: November 13, 2025

Run on Replicate →