nvidia/nemotron-3-nano-30b-a3b π’πβ β π
About
Nemotron-3-Nano-30B-A3B is a large language model (LLM) trained from scratch by NVIDIA
Example Output
Prompt:
"write a haiku about GPUs"
Output
Silicon hearts beat,
Parallel dreams in siliconβ
Lightning renders fast.
Parallel dreams in siliconβ
Lightning renders fast.
Performance Metrics
44.38s
Prediction Time
44.49s
Total Time
All Input Parameters
{
"top_k": 50,
"top_p": 1,
"prompt": "write a haiku about GPUs",
"temperature": 1,
"system_prompt": "",
"max_new_tokens": 256,
"enable_thinking": false,
"repetition_penalty": 1.1
}
Input Parameters
- top_k
- Top-k sampling. Lower values make output more focused
- top_p
- Top-p (nucleus) sampling. Use 1.0 for reasoning tasks, 0.95 for tool calling
- prompt (required)
- Input prompt for the model
- temperature
- Temperature for sampling. Use 1.0 for reasoning tasks, 0.6 for tool calling
- system_prompt
- System prompt to guide model behavior (optional)
- max_new_tokens
- Maximum number of tokens to generate
- enable_thinking
- Enable reasoning/thinking mode for complex problems. Set to False for greedy search
- repetition_penalty
- Penalty for repeating tokens. Higher values reduce repetition
Output Schema
Output
Example Execution Logs
NemotronH requires an initialized `NemotronHHybridDynamicCache` to return a cache. None was provided, so no cache will be returned.
Version Details
- Version ID
135b4a9c545002830563436c88ea56b401d135faa59da6773bc5934d2ae56344- Version Created
- December 15, 2025