fofr/chained-video 🔢✓📝 → 🖼️

▶️ 34 runs 📅 Aug 2025 ⚙️ Cog 0.14.0
dialogue-check text-to-video video-chaining video-generation

About

Example Output

Output

Performance Metrics

307.12s Prediction Time
307.95s Total Time
All Input Parameters
{
  "chain_length": 3,
  "check_dialogue": true,
  "initial_prompt": "a news presenter gets confused during a broadcast",
  "use_fast_model": false
}
Input Parameters
chain_length Type: integerDefault: 3Range: 1 - 10
Number of videos to chain together
check_dialogue Type: booleanDefault: false
Check for dialogue/speech in videos and include in context
initial_prompt (required) Type: string
Initial video prompt
use_fast_model Type: booleanDefault: true
Use Wan 2.2 5B (faster/cheaper) instead of Veo-3-fast
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using Veo-3-fast for video generation
Generating initial video with prompt: a news presenter gets confused during a broadcast
=== Generating video 2 of 3 ===
Extracting last frame...
Video path: /tmp/e5c1b3d9867fa3d1/tmp1utwsd9h.mp4
Video path type: <class 'replicate.use.URLPath'>
Resolved video path: /tmp/e5c1b3d9867fa3d1/tmp1utwsd9h.mp4
Video file exists, size: 1665811 bytes
Getting total frame count...
Total frames: 192, extracting frame index: 191
Extracting exact last frame to /tmp/tmp27eqe6ir.jpg...
Frame extraction completed successfully
ffmpeg stdout: 
ffmpeg stderr: ffmpeg version 5.1.6-0+deb12u1 Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr --extra-version=0+deb12u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
-vsync is deprecated. Use -fps_mode
Passing a number to -vsync is deprecated, use a string argument as described in the manual.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/e5c1b3d9867fa3d1/tmp1utwsd9h.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Google
  Duration: 00:00:08.00, start: 0.000000, bitrate: 1665 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 1404 kb/s, 24 fps, 24 tbr, 12288 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 255 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Output #0, image2, to '/tmp/tmp27eqe6ir.jpg':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.27.100
  Stream #0:0(und): Video: mjpeg, yuvj420p(pc, progressive), 1280x720, q=2-31, 200 kb/s, 24 fps, 24 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc59.37.100 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
frame=    1 fps=0.0 q=2.0 size=N/A time=00:00:08.00 bitrate=N/A speed=21.3x    
frame=    1 fps=0.0 q=2.0 Lsize=N/A time=00:00:08.00 bitrate=N/A speed=  21x    
video:44kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Last frame extracted to: /tmp/tmp27eqe6ir.jpg
Checking for dialogue in video...
Transcribing audio from video: /tmp/e5c1b3d9867fa3d1/tmp1utwsd9h.mp4
Transcription result: Good evening. Tonight's top story is, well, it seems I've misplaced my notes.
Found dialogue: Good evening. Tonight's top story is, well, it seems I've misplaced my notes.
Calling Claude for next prompt with visual context...
Claude generated prompt: A news presenter in a navy blue suit and red tie sits at a white news desk, looking increasingly flustered as he searches through papers scattered across the desk surface. The studio lighting remains consistent with warm professional broadcast lighting against the grey and blue background. He glances nervously at the camera then back down at his desk, shuffling through the papers with visible confusion. His professional composure begins to crack as he realizes the extent of his predicament. He looks back up at the camera with a forced smile and says: "Bear with me folks, this is live television after all."
Calling Veo-3-fast for next video...
Generated video 2: /tmp/eb7dc1cb7913790a/tmpzf094xpp.mp4
Total videos so far: 2
=== Generating video 3 of 3 ===
Extracting last frame...
Video path: /tmp/eb7dc1cb7913790a/tmpzf094xpp.mp4
Video path type: <class 'replicate.use.URLPath'>
Resolved video path: /tmp/eb7dc1cb7913790a/tmpzf094xpp.mp4
Video file exists, size: 1763749 bytes
Getting total frame count...
Total frames: 192, extracting frame index: 191
Extracting exact last frame to /tmp/tmph41qp9e4.jpg...
Frame extraction completed successfully
ffmpeg stdout: 
ffmpeg stderr: ffmpeg version 5.1.6-0+deb12u1 Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr --extra-version=0+deb12u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
-vsync is deprecated. Use -fps_mode
Passing a number to -vsync is deprecated, use a string argument as described in the manual.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/eb7dc1cb7913790a/tmpzf094xpp.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Google
  Duration: 00:00:08.00, start: 0.000000, bitrate: 1763 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 1502 kb/s, 24 fps, 24 tbr, 12288 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 255 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Output #0, image2, to '/tmp/tmph41qp9e4.jpg':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.27.100
  Stream #0:0(und): Video: mjpeg, yuvj420p(pc, progressive), 1280x720, q=2-31, 200 kb/s, 24 fps, 24 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc59.37.100 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
frame=    1 fps=0.0 q=2.0 size=N/A time=00:00:08.00 bitrate=N/A speed=20.2x    
frame=    1 fps=0.0 q=2.0 Lsize=N/A time=00:00:08.00 bitrate=N/A speed=19.9x    
video:50kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Last frame extracted to: /tmp/tmph41qp9e4.jpg
Checking for dialogue in video...
Transcribing audio from video: /tmp/eb7dc1cb7913790a/tmpzf094xpp.mp4
Transcription result: Bear with me, folks. This is live television after all.
Found dialogue: Bear with me, folks. This is live television after all.
Calling Claude for next prompt with visual context...
Claude generated prompt: The news presenter maintains his forced smile for a moment longer before it falters completely. He takes a deep breath, straightens his red tie, and looks directly into the camera with newfound determination. The studio lighting catches a bead of sweat on his forehead as he places both hands flat on the white desk, pushing aside the scattered papers. His expression shifts from flustered confusion to professional resolve as he leans slightly forward toward the camera. The grey and blue studio background remains consistent as he clears his throat and says with renewed confidence: "You know what? Let's just wing it tonight."
Calling Veo-3-fast for next video...
Generated video 3: /tmp/e6b991dab1bba44e/tmpn_g0i13e.mp4
Total videos so far: 3
=== FINAL: Generated 3 total videos ===
Combining 3 videos into single output
Combining 3 videos...
Version Details
Version ID
49f281ffb3251233e54d9952b7903f607022f5be04cd5338e6a6b2aae5f9be0a
Version Created
August 6, 2025
Run on Replicate →