openai/gpt-oss-120b 🔢📝 → 📝

⭐ Official ▶️ 107.6K runs 📅 Aug 2025 ⚙️ Cog 0.16.8 📄 Paper ⚖️ License

code-generation question-answering text-generation

About

120b open-weight language model from OpenAI

Example Output

Prompt:

"What does the company Replicate do?"

Output

Replicate is a cloud platform that makes it easy to run, share, and integrate open‑source machine‑learning models.

Core offering

Model hosting & execution – Users can upload any public or private model (e.g., Stable Diffusion, Whisper, CLIP, Llama, etc.) and Replicate spins up the necessary infrastructure to run it on demand.
API‑first access – Each model gets a REST API endpoint (and optional Python SDK) so developers can call the model from web apps, scripts, or other services without managing GPUs or containers themselves.
Versioning & reproducibility – Models are version‑controlled (Git‑style) and the platform records the exact code, weights, and environment used for each run, ensuring results are reproducible.
Scalable compute – Replicate automatically provisions the appropriate GPU/CPU resources, handling scaling, queuing, and billing per inference request.
Marketplace – A public catalog where creators publish models that others can try for free or pay per‑use, fostering a community‑driven ecosystem.

Typical use cases

Use case	How Replicate helps
Rapid prototyping	Spin up a model with a single CLI command or API call, no need to set up Docker, CUDA, or cloud VMs.
Production inference	Deploy a stable endpoint that scales with traffic; pay only for the compute you actually use.
Integrating AI into products	Add image generation, speech‑to‑text, text summarization, etc., by calling a simple HTTP endpoint from your app.
Research & experimentation	Fork existing models, tweak hyper‑parameters, and instantly test new versions without re‑installing dependencies.
Monetizing models	Publish a model on the marketplace, set a price per inference, and earn revenue while Replicate handles billing and infrastructure.

Business model

Pay‑as‑you‑go: Users are charged per compute second (GPU/CPU) used by each inference request.
Marketplace fees: Replicate takes a percentage of revenue when creators sell access to their models.
Enterprise plans: Custom SLAs, private VPC deployment, and dedicated support for larger organizations.

Why it matters

Lower barrier to entry – Developers and companies can leverage powerful, often expensive‑to‑run models without owning GPUs.
Speed to market – Reduces the time from model selection to production-ready API from weeks to minutes.
Community‑driven – By surfacing a catalog of ready‑made models, Replicate accelerates reuse and collaboration across the AI ecosystem.

In short, Replicate provides an infrastructure‑as‑a‑service for running any machine‑learning model, turning heavyweight GPU workloads into simple, on‑demand API calls.

Performance Metrics

5.85s Prediction Time

5.86s Total Time

All Input Parameters

{
  "top_p": 1,
  "prompt": "What does the company Replicate do?",
  "max_tokens": 1024,
  "temperature": 0.1,
  "presence_penalty": 0,
  "frequency_penalty": 0
}

Input Parameters

top_p Type: numberDefault: 1: Top-p (nucleus) sampling
prompt Type: stringDefault:: Prompt
max_tokens Type: integerDefault: 2048: The maximum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.1: The value used to modulate the next token probabilities.
presence_penalty Type: numberDefault: 0: Presence penalty
frequency_penalty Type: numberDefault: 0: Frequency penalty

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

Prompt: What does the company Replicate do?
Input token count: 8
Output token count: 613
TTFT: 1.09s
Tokens per second: 104.81
Total time: 5.85s

Version Details

Version ID: a58b20fe590855618e2f14be1843e609ee0fd589116d13372c433fe1f55dd598
Version Created: October 13, 2025

Run on Replicate →