nvidia/nemotron-3-nano-30b-a3b πŸ”’πŸ“βœ“ β†’ πŸ“

▢️ 261 runs πŸ“… Dec 2025 βš™οΈ Cog 0.16.9 πŸ”— GitHub βš–οΈ License
code-generation long-context text-generation

About

Nemotron-3-Nano-30B-A3B is a large language model (LLM) trained from scratch by NVIDIA

Example Output

Prompt:

"write a haiku about GPUs"

Output

Silicon hearts beat,
Parallel dreams in siliconβ€”
Lightning renders fast.

Performance Metrics

44.38s Prediction Time
44.49s Total Time
All Input Parameters
{
  "top_k": 50,
  "top_p": 1,
  "prompt": "write a haiku about GPUs",
  "temperature": 1,
  "system_prompt": "",
  "max_new_tokens": 256,
  "enable_thinking": false,
  "repetition_penalty": 1.1
}
Input Parameters
top_k Type: integerDefault: 50Range: 0 - 100
Top-k sampling. Lower values make output more focused
top_p Type: numberDefault: 1Range: 0 - 1
Top-p (nucleus) sampling. Use 1.0 for reasoning tasks, 0.95 for tool calling
prompt (required) Type: string
Input prompt for the model
temperature Type: numberDefault: 1Range: 0 - 2
Temperature for sampling. Use 1.0 for reasoning tasks, 0.6 for tool calling
system_prompt Type: stringDefault:
System prompt to guide model behavior (optional)
max_new_tokens Type: integerDefault: 256Range: 1 - 8192
Maximum number of tokens to generate
enable_thinking Type: booleanDefault: true
Enable reasoning/thinking mode for complex problems. Set to False for greedy search
repetition_penalty Type: numberDefault: 1.1Range: 1 - 2
Penalty for repeating tokens. Higher values reduce repetition
Output Schema

Output

Type: array β€’ Items Type: string

Example Execution Logs
NemotronH requires an initialized `NemotronHHybridDynamicCache` to return a cache. None was provided, so no cache will be returned.
Version Details
Version ID
135b4a9c545002830563436c88ea56b401d135faa59da6773bc5934d2ae56344
Version Created
December 15, 2025
Run on Replicate β†’