google/veo-3.1 🔢🖼️📝❓✓ → 🖼️

⭐ Official ▶️ 10.3K runs 📅 Oct 2025 ⚙️ Cog 0.16.8
image-to-video image-to-video-with-audio text-to-video-with-audio video-consistent-character-generation

About

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

Example Output

Prompt:

"the woman is giving an interview for a podcast, wearing a pink top with the logo, it also neatly says "Veo 3.1", she is in a midcentury modern studio with pink lighting, she talks about using Veo 3.1 with reference images to put things into videos you're making, the logo is also in a framed picture against black behind her"

Output

Performance Metrics

101.79s Prediction Time
101.96s Total Time
All Input Parameters
{
  "prompt": "the woman is giving an interview for a podcast, wearing a pink top with the logo, it also neatly says \"Veo 3.1\", she is in a midcentury modern studio with pink lighting, she talks about using Veo 3.1 with reference images to put things into videos you're making, the logo is also in a framed picture against black behind her",
  "duration": 8,
  "resolution": "1080p",
  "aspect_ratio": "16:9",
  "generate_audio": true,
  "reference_images": [
    "https://replicate.delivery/pbxt/Nt8bL90QO5In3RDkC82HtqeXqNdITglTVpaicTgrdT8mtjiW/0_1.webp",
    "https://replicate.delivery/pbxt/Nt8bLbk1uz4EIMWhIQ0DyjO8BGJYYeAgQWgEnFUWNMOGEpbU/Screenshot%202025-08-26%20at%205.30.12%E2%80%AFPM.png"
  ]
}
Input Parameters
seed Type: integer
Random seed. Omit for random generations
image Type: string
Input image to start generating from. Ideal images are 16:9 or 9:16 and 1280x720 or 720x1280, depending on the aspect ratio you choose.
prompt (required) Type: string
Text prompt for video generation
duration Default: 8
Video duration in seconds
last_frame Type: string
Ending image for interpolation. When provided with an input image, creates a transition between the two images.
resolution Default: 720p
Resolution of the generated video
aspect_ratio Default: 16:9
Video aspect ratio
generate_audio Type: booleanDefault: true
Generate audio with the video
negative_prompt Type: string
Description of what to exclude from the generated video
reference_images Type: arrayDefault:
1 to 3 reference images for subject-consistent generation (reference-to-video, or R2V). Reference images only work with 16:9 aspect ratio and 8-second duration. Last frame is ignored if reference images are provided.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 310333719
Starting video generation...
Still generating...
Still generating...
Generated video in 99.76 seconds
Downloading video...
Downloaded video in 0.42 seconds
Version Details
Version ID
a55204f92195a6c535170095e221116968f43614517d8ad32b338fa12ee4460b
Version Created
October 15, 2025
Run on Replicate →