meta/llama-4-scout-instruct 🔢📝 → 📝

⭐ Official ▶️ 3.4M runs 📅 Apr 2025 ⚙️ Cog 0.16.9 ⚖️ License
code-generation question-answering text-generation

About

A 17 billion parameter model with 16 experts

Example Output

Prompt:

"Hello, Llama!"

Output

Hello! It's nice to meet you. I'm Llama, a large language model developed by Meta. How can I assist you today?

Performance Metrics

0.55s Prediction Time
0.56s Total Time
All Input Parameters
{
  "top_p": 1,
  "prompt": "Hello, Llama!",
  "max_tokens": 1024,
  "temperature": 0.6,
  "presence_penalty": 0,
  "frequency_penalty": 0
}
Input Parameters
top_k Type: integerDefault: 50
The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
top_p Type: numberDefault: 0.9
A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
prompt Type: stringDefault:
Prompt
max_tokens Type: integerDefault: 4096Range: 0 - 131072
The maximum number of tokens the model should generate as output.
min_tokens Type: integerDefault: 0
The minimum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.6
The value used to modulate the next token probabilities.
system_prompt Type: stringDefault: You are a helpful assistant.
System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
stop_sequences Type: stringDefault:
A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
prompt_template Type: stringDefault:
A template to format the prompt with. If not provided, the default prompt template will be used.
presence_penalty Type: numberDefault: 0
Presence penalty
frequency_penalty Type: numberDefault: 0
Frequency penalty
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
Prompt: Hello, Llama!
Input token count: 5
Output token count: 29
TTFT: 0.23s
Tokens per second: 53.68
Total time: 0.54s
Version Details
Version ID
e1ce7061df35889e7846dc7ca71e4aa93fad6efcc9fd4ecd6ac5c36b533f3c06
Version Created
November 28, 2025
Run on Replicate →