google/gemini-3-pro πŸ–ΌοΈπŸ”’πŸ“β“ β†’ πŸ“

⭐ Official ▢️ 1.3M runs πŸ“… Nov 2025 βš™οΈ Cog 0.20.0
code-generation data-analysis document-summarization image-analysis image-captioning image-to-text question-answering text-generation video-to-text visual-understanding

About

Google's most advanced reasoning Gemini model

Example Output

Prompt:

"Why should I be scared of this audio?"

Output

You shouldn't be scared for your safety, but you might be scared for your pride.

This audio is the 1987 hit song "Never Gonna Give You Up" by Rick Astley.

The reason you might fear it is because of the internet phenomenon known as "Rickrolling."

Here is the context:

  1. The Bait and Switch: A "Rickroll" is a prank where someone sends a link that claims to be something exciting (like a leaked movie trailer, free money, or breaking news).
  2. The Trap: When you click the link, instead of getting what you were promised, you are confronted with the music video for this song.
  3. The "Fear": The fear associated with this audio is the paranoia that any link you click on the internet might be a trap. Hearing those opening drums means you have been tricked, you let your guard down, and you lost the game.

So, if you hear this audio unexpectedly, it doesn't mean you are in dangerβ€”it just means someone successfully pulled a prank on you.

Performance Metrics

12.15s Prediction Time
12.36s Total Time
All Input Parameters
{
  "audio": "https://replicate.delivery/pbxt/O5Vw2eTOp7z4V27QYXqEUQZ5OvwTEKj2TVf3syi4dTJpvUG9/Never%20Gonna%20Give%20You%20Up%20-%20Rick%20Astley.mp3",
  "top_p": 0.95,
  "images": [],
  "prompt": "Why should I be scared of this audio?",
  "videos": [],
  "temperature": 1,
  "thinking_level": "low",
  "max_output_tokens": 65535
}
Input Parameters
audio Type: stringDefault: null
Input audio to send with the prompt (max 1 audio file, up to 8.4 hours)
top_p Type: numberDefault: 0.95Range: 0 - 1
Nucleus sampling parameter - the model considers the results of the tokens with top_p probability mass
images Type: arrayDefault:
Input images to send with the prompt (max 10 images, each up to 7MB)
prompt (required) Type: string
The text prompt to send to the model
videos Type: arrayDefault:
Input videos to send with the prompt (max 10 videos, each up to 45 minutes)
video_fps Type: numberDefault: nullRange: 0.1 - 60
Frame rate (frames per second) at which to sample input videos. Higher values give the model more frames to reason over (useful for fast-moving footage) but consume more tokens. Omit to use the default sampling rate (1 fps).
temperature Type: numberDefault: 1Range: 0 - 2
Sampling temperature between 0 and 2
thinking_level Default: none
Thinking level for reasoning (low or high). Replaces thinking_budget for Gemini 3 models.
max_output_tokens Type: integerDefault: 65535Range: 1 - 65535
Maximum number of tokens to generate
system_instruction Type: stringDefault: null
System instruction to guide the model's behavior
Output Schema

Output

Type: array β€’ Items Type: string

Example Execution Logs
Generating text with gemini-3-pro-preview...
Uploading audio cog-input-1267550957.mp3 to Gemini Files API...
Uploaded cog-input-1267550957.mp3, waiting for processing...
Audio ready (took 0s)
Warning: there are non-text parts in the response: ['thought_signature'], returning concatenated text result from text parts. Check the full candidates.content.parts accessor to get the full model response.
Warning: there are non-text parts in the response: ['thought_signature'], returning concatenated text result from text parts. Check the full candidates.content.parts accessor to get the full model response.
Text generation completed in 12.14s
Input token count: 5359
Output token count: 235
Version Details
Version ID
9e9bc27bec7a7ef7dcddee23839719fa9bf6980a51b698ff4e7468c1517cf4ff
Version Created
May 21, 2026
Run on Replicate β†’