nvidia/nemotron-3-nano-30b-a3b 🔢📝✓ → 📝

▶️ 293 runs 📅 Dec 2025 ⚙️ Cog 0.16.9 🔗 GitHub ⚖️ License

code-generation long-context text-generation

Performance

44.4sTypical run time

293Total runs

About

Nemotron-3-Nano-30B-A3B is a large language model (LLM) trained from scratch by NVIDIA

Example Output

Prompt:

"write a haiku about GPUs"

Output

Silicon hearts beat,
Parallel dreams in silicon—
Lightning renders fast.

Performance Metrics

44.38s Prediction Time

44.49s Total Time

All Input Parameters

{
  "top_k": 50,
  "top_p": 1,
  "prompt": "write a haiku about GPUs",
  "temperature": 1,
  "system_prompt": "",
  "max_new_tokens": 256,
  "enable_thinking": false,
  "repetition_penalty": 1.1
}

Input Parameters

top_k Type: integerDefault: 50Range: 0 - 100: Top-k sampling. Lower values make output more focused
top_p Type: numberDefault: 1Range: 0 - 1: Top-p (nucleus) sampling. Use 1.0 for reasoning tasks, 0.95 for tool calling
prompt (required) Type: string: Input prompt for the model
temperature Type: numberDefault: 1Range: 0 - 2: Temperature for sampling. Use 1.0 for reasoning tasks, 0.6 for tool calling
system_prompt Type: stringDefault:: System prompt to guide model behavior (optional)
max_new_tokens Type: integerDefault: 256Range: 1 - 8192: Maximum number of tokens to generate
enable_thinking Type: booleanDefault: true: Enable reasoning/thinking mode for complex problems. Set to False for greedy search
repetition_penalty Type: numberDefault: 1.1Range: 1 - 2: Penalty for repeating tokens. Higher values reduce repetition

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

NemotronH requires an initialized `NemotronHHybridDynamicCache` to return a cache. None was provided, so no cache will be returned.

Version Details

Version ID: 135b4a9c545002830563436c88ea56b401d135faa59da6773bc5934d2ae56344
Version Created: December 15, 2025

Run on Replicate →