openai/gpt-oss-120b 🔢📝 → 📝

⭐ Official ▶️ 107.6K runs 📅 Aug 2025 ⚙️ Cog 0.16.8 📄 Paper ⚖️ License
code-generation question-answering text-generation

About

120b open-weight language model from OpenAI

Example Output

Prompt:

"What does the company Replicate do?"

Output

Replicate is a cloud platform that makes it easy to run, share, and integrate open‑source machine‑learning models.

Core offering

  • Model hosting & execution – Users can upload any public or private model (e.g., Stable Diffusion, Whisper, CLIP, Llama, etc.) and Replicate spins up the necessary infrastructure to run it on demand.
  • API‑first access – Each model gets a REST API endpoint (and optional Python SDK) so developers can call the model from web apps, scripts, or other services without managing GPUs or containers themselves.
  • Versioning & reproducibility – Models are version‑controlled (Git‑style) and the platform records the exact code, weights, and environment used for each run, ensuring results are reproducible.
  • Scalable compute – Replicate automatically provisions the appropriate GPU/CPU resources, handling scaling, queuing, and billing per inference request.
  • Marketplace – A public catalog where creators publish models that others can try for free or pay per‑use, fostering a community‑driven ecosystem.

Typical use cases

Use case How Replicate helps
Rapid prototyping Spin up a model with a single CLI command or API call, no need to set up Docker, CUDA, or cloud VMs.
Production inference Deploy a stable endpoint that scales with traffic; pay only for the compute you actually use.
Integrating AI into products Add image generation, speech‑to‑text, text summarization, etc., by calling a simple HTTP endpoint from your app.
Research & experimentation Fork existing models, tweak hyper‑parameters, and instantly test new versions without re‑installing dependencies.
Monetizing models Publish a model on the marketplace, set a price per inference, and earn revenue while Replicate handles billing and infrastructure.

Business model

  • Pay‑as‑you‑go: Users are charged per compute second (GPU/CPU) used by each inference request.
  • Marketplace fees: Replicate takes a percentage of revenue when creators sell access to their models.
  • Enterprise plans: Custom SLAs, private VPC deployment, and dedicated support for larger organizations.

Why it matters

  • Lower barrier to entry – Developers and companies can leverage powerful, often expensive‑to‑run models without owning GPUs.
  • Speed to market – Reduces the time from model selection to production-ready API from weeks to minutes.
  • Community‑driven – By surfacing a catalog of ready‑made models, Replicate accelerates reuse and collaboration across the AI ecosystem.

In short, Replicate provides an infrastructure‑as‑a‑service for running any machine‑learning model, turning heavyweight GPU workloads into simple, on‑demand API calls.

Performance Metrics

5.85s Prediction Time
5.86s Total Time
All Input Parameters
{
  "top_p": 1,
  "prompt": "What does the company Replicate do?",
  "max_tokens": 1024,
  "temperature": 0.1,
  "presence_penalty": 0,
  "frequency_penalty": 0
}
Input Parameters
top_p Type: numberDefault: 1
Top-p (nucleus) sampling
prompt Type: stringDefault:
Prompt
max_tokens Type: integerDefault: 2048
The maximum number of tokens the model should generate as output.
temperature Type: numberDefault: 0.1
The value used to modulate the next token probabilities.
presence_penalty Type: numberDefault: 0
Presence penalty
frequency_penalty Type: numberDefault: 0
Frequency penalty
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
Prompt: What does the company Replicate do?
Input token count: 8
Output token count: 613
TTFT: 1.09s
Tokens per second: 104.81
Total time: 5.85s
Version Details
Version ID
a58b20fe590855618e2f14be1843e609ee0fd589116d13372c433fe1f55dd598
Version Created
October 13, 2025
Run on Replicate →