google/gemini-3-pro 🖼️🔢📝❓ → 📝

⭐ Official ▶️ 1.3M runs 📅 Nov 2025 ⚙️ Cog 0.21.0

code-generation data-analysis document-summarization image-analysis image-captioning image-to-text question-answering text-generation video-to-text visual-understanding

Performance

12.1sTypical run time

1.3MTotal runs

About

Google's most advanced reasoning Gemini model

Example Output

Prompt:

"Why should I be scared of this audio?"

Output

You shouldn't be scared for your safety, but you might be scared for your pride.

This audio is the 1987 hit song "Never Gonna Give You Up" by Rick Astley.

The reason you might fear it is because of the internet phenomenon known as "Rickrolling."

Here is the context:

The Bait and Switch: A "Rickroll" is a prank where someone sends a link that claims to be something exciting (like a leaked movie trailer, free money, or breaking news).
The Trap: When you click the link, instead of getting what you were promised, you are confronted with the music video for this song.
The "Fear": The fear associated with this audio is the paranoia that any link you click on the internet might be a trap. Hearing those opening drums means you have been tricked, you let your guard down, and you lost the game.

So, if you hear this audio unexpectedly, it doesn't mean you are in danger—it just means someone successfully pulled a prank on you.

Performance Metrics

12.15s Prediction Time

12.36s Total Time

All Input Parameters

{
  "audio": "https://replicate.delivery/pbxt/O5Vw2eTOp7z4V27QYXqEUQZ5OvwTEKj2TVf3syi4dTJpvUG9/Never%20Gonna%20Give%20You%20Up%20-%20Rick%20Astley.mp3",
  "top_p": 0.95,
  "images": [],
  "prompt": "Why should I be scared of this audio?",
  "videos": [],
  "temperature": 1,
  "thinking_level": "low",
  "max_output_tokens": 65535
}

Input Parameters

audio Type: stringDefault: null: Input audio to send with the prompt (max 1 audio file, up to 8.4 hours)
top_p Type: numberDefault: 0.95Range: 0 - 1: Nucleus sampling parameter - the model considers the results of the tokens with top_p probability mass
images Type: arrayDefault:: Input images to send with the prompt (max 10 images, each up to 7MB)
prompt (required) Type: string: The text prompt to send to the model
videos Type: arrayDefault:: Input videos to send with the prompt (max 10 videos, each up to 45 minutes)
video_fps Type: numberDefault: nullRange: 0.1 - 60: Frame rate (frames per second) at which to sample input videos. Higher values give the model more frames to reason over (useful for fast-moving footage) but consume more tokens. Omit to use the default sampling rate (1 fps).
temperature Type: numberDefault: 1Range: 0 - 2: Sampling temperature between 0 and 2
thinking_level Default: none: Thinking level for reasoning (low or high). Replaces thinking_budget for Gemini 3 models.
max_output_tokens Type: integerDefault: 65535Range: 1 - 65535: Maximum number of tokens to generate
system_instruction Type: stringDefault: null: System instruction to guide the model's behavior

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

Generating text with gemini-3-pro-preview...
Uploading audio cog-input-1267550957.mp3 to Gemini Files API...
Uploaded cog-input-1267550957.mp3, waiting for processing...
Audio ready (took 0s)
Warning: there are non-text parts in the response: ['thought_signature'], returning concatenated text result from text parts. Check the full candidates.content.parts accessor to get the full model response.
Warning: there are non-text parts in the response: ['thought_signature'], returning concatenated text result from text parts. Check the full candidates.content.parts accessor to get the full model response.
Text generation completed in 12.14s
Input token count: 5359
Output token count: 235

Version Details

Version ID: f643f4be1635546a523e4cedf4497efbab541eb831df678a89f4bff9400a541a
Version Created: June 19, 2026

Run on Replicate →