meta/llama-4-scout-instruct 🔢📝 → 📝
About
A 17 billion parameter model with 16 experts
Example Output
Prompt:
"Hello, Llama!"
Output
Hello! It's nice to meet you. I'm Llama, a large language model developed by Meta. How can I assist you today?
Performance Metrics
0.55s
Prediction Time
0.56s
Total Time
All Input Parameters
{
"top_p": 1,
"prompt": "Hello, Llama!",
"max_tokens": 1024,
"temperature": 0.6,
"presence_penalty": 0,
"frequency_penalty": 0
}
Input Parameters
- top_k
- The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- top_p
- A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- prompt
- Prompt
- max_tokens
- The maximum number of tokens the model should generate as output.
- min_tokens
- The minimum number of tokens the model should generate as output.
- temperature
- The value used to modulate the next token probabilities.
- system_prompt
- System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
- stop_sequences
- A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
- prompt_template
- A template to format the prompt with. If not provided, the default prompt template will be used.
- presence_penalty
- Presence penalty
- frequency_penalty
- Frequency penalty
Output Schema
Output
Example Execution Logs
Prompt: Hello, Llama! Input token count: 5 Output token count: 29 TTFT: 0.23s Tokens per second: 53.68 Total time: 0.54s
Version Details
- Version ID
e1ce7061df35889e7846dc7ca71e4aa93fad6efcc9fd4ecd6ac5c36b533f3c06- Version Created
- November 28, 2025