google/gemini-3-pro πŸ–ΌοΈπŸ”’πŸ“β“ β†’ πŸ“

⭐ Official ▢️ 398.2K runs πŸ“… Nov 2025 βš™οΈ Cog 0.16.9
image-analysis image-captioning image-to-text text-generation video-to-text visual-understanding

About

Google's most advanced reasoning Gemini model

Example Output

Prompt:

"Why should I be scared of this audio?"

Output

You shouldn't be scared for your safety, but you might be scared for your pride.

This audio is the 1987 hit song "Never Gonna Give You Up" by Rick Astley.

The reason you might fear it is because of the internet phenomenon known as "Rickrolling."

Here is the context:

  1. The Bait and Switch: A "Rickroll" is a prank where someone sends a link that claims to be something exciting (like a leaked movie trailer, free money, or breaking news).
  2. The Trap: When you click the link, instead of getting what you were promised, you are confronted with the music video for this song.
  3. The "Fear": The fear associated with this audio is the paranoia that any link you click on the internet might be a trap. Hearing those opening drums means you have been tricked, you let your guard down, and you lost the game.

So, if you hear this audio unexpectedly, it doesn't mean you are in dangerβ€”it just means someone successfully pulled a prank on you.

Performance Metrics

12.15s Prediction Time
12.36s Total Time
All Input Parameters
{
  "audio": "https://replicate.delivery/pbxt/O5Vw2eTOp7z4V27QYXqEUQZ5OvwTEKj2TVf3syi4dTJpvUG9/Never%20Gonna%20Give%20You%20Up%20-%20Rick%20Astley.mp3",
  "top_p": 0.95,
  "images": [],
  "prompt": "Why should I be scared of this audio?",
  "videos": [],
  "temperature": 1,
  "thinking_level": "low",
  "max_output_tokens": 65535
}
Input Parameters
audio Type: string
Input audio to send with the prompt (max 1 audio file, up to 8.4 hours)
top_p Type: numberDefault: 0.95Range: 0 - 1
Nucleus sampling parameter - the model considers the results of the tokens with top_p probability mass
images Type: arrayDefault:
Input images to send with the prompt (max 10 images, each up to 7MB)
prompt (required) Type: string
The text prompt to send to the model
videos Type: arrayDefault:
Input videos to send with the prompt (max 10 videos, each up to 45 minutes)
temperature Type: numberDefault: 1Range: 0 - 2
Sampling temperature between 0 and 2
thinking_level
Thinking level for reasoning (low or high). Replaces thinking_budget for Gemini 3 models.
max_output_tokens Type: integerDefault: 65535Range: 1 - 65535
Maximum number of tokens to generate
system_instruction Type: string
System instruction to guide the model's behavior
Output Schema

Output

Type: array β€’ Items Type: string

Example Execution Logs
Generating text with gemini-3-pro-preview...
Uploading audio cog-input-1267550957.mp3 to Gemini Files API...
Uploaded cog-input-1267550957.mp3, waiting for processing...
Audio ready (took 0s)
Warning: there are non-text parts in the response: ['thought_signature'], returning concatenated text result from text parts. Check the full candidates.content.parts accessor to get the full model response.
Warning: there are non-text parts in the response: ['thought_signature'], returning concatenated text result from text parts. Check the full candidates.content.parts accessor to get the full model response.
Text generation completed in 12.14s
Input token count: 5359
Output token count: 235
Version Details
Version ID
6c727b6aa9d5663b515ff4d6d36520213d9991d5078adeafce44e6e49ed6f6ac
Version Created
November 25, 2025
Run on Replicate β†’