avocado/podcast-clip-generator 📝 → ❓

▶️ 24 runs 📅 Oct 2025 ⚙️ Cog 0.14.0
podcast text-to-speech text-to-video-with-audio

About

Generate a short clip of an AI character speaking on a podcast

Example Output

Output

{"audio":"https://replicate.delivery/xezq/AEwx1lIif0X1GSETMJRXiCPVPI3bNeOGlr5YNfNfbryCGQPWB/tmpi8eoz5ic.mp3","image":"https://replicate.delivery/xezq/reHZofRArjjc9kGBEcw1Ov7xL0rS5ZWOdrAME9VtHd5gB0jVA/out-0.webp","video":"https://replicate.delivery/xezq/G7odbwSf8n1UKiSOvt9Qbizie6nJLrn8psHO58f8zOqADoHrA/output.mp4","script":"Hi everyone! So today I wanna tell you how to run a restaurant because I figured it out at the playground yesterday. First, you gotta have a really long line like the slide, and make everyone wait their turn even if they're crying. That's how you know it's popular! Then, just like when Tommy hogged all the sandbox toys, you should only give people one fork and make them share. It saves money! Oh, and the most important thing - if someone asks for something you don't have, just throw sand at them and run away. That's what I do when kids want my juice box. Also, make sure to close randomly for nap time because grown-ups need naps too, even if people are still hungry. Trust me, I'm three and I know these things. The playground taught me that if you're loud enough and run around a lot, people will pay attention to you, and that's basically what restaurants do, right? Anyway, that's my business advice. Now I gotta go eat some goldfish crackers. Bye!"}

Performance Metrics

1410.95s Prediction Time
1411.35s Total Time
All Input Parameters
{
  "voice_id": "Deep_Voice_Man",
  "podcast_style": "professional studio podcast with a single host speaking into a microphone, modern setup, warm lighting, restaurant background",
  "podcaster_prompt": "You are a toddler giving questionable advice on how to run a restaurants based on an experience at a playground"
}
Input Parameters
voice_id Type: stringDefault: Friendly_Person
Voice ID for the podcast host. Options: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
podcast_style Type: stringDefault: professional studio podcast with a single host speaking into a microphone, modern setup, warm lighting
Description of the podcast setting/style for image generation
podcaster_prompt (required) Type: string
prompt for the podcaster to generate an anecdote about
Output Schema
audio Type: stringFormat: uri
Audio
image Type: stringFormat: uri
Image
video Type: stringFormat: uri
Video
script Type: string
Script
Example Execution Logs
Generating podcast script...
Script generated (958 characters)
Generating speech audio...
Audio generated: /tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3
Generating podcast host image...
Image generated: /tmp/cog-runner-tmp-3902082295/646e6ed94550071c/out-0.webp
Getting audio duration...
Audio duration: 57.82 seconds
Splitting audio into 4 chunks...
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
  configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[mp3 @ 0x76f647ad4ac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3':
  Metadata:
    encoder         : Lavf58.29.100
  Duration: 00:00:57.82, start: 0.000000, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_0.mp3':
  Metadata:
    TSSE            : Lavf61.7.100
  Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x76f647ad1580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.258127%
size=     235KiB time=00:00:15.01 bitrate= 128.3kbits/s speed=1.11e+03x    
Created chunk 1/4: /tmp/audio_chunk_0.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
  configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[mp3 @ 0x78159a20dac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3':
  Metadata:
    encoder         : Lavf58.29.100
  Duration: 00:00:57.82, start: 0.000000, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_1.mp3':
  Metadata:
    TSSE            : Lavf61.7.100
  Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x78159a20a580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.257509%
size=     236KiB time=00:00:15.02 bitrate= 128.5kbits/s speed=1.27e+03x    
Created chunk 2/4: /tmp/audio_chunk_1.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
  configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[mp3 @ 0x74bc6eff3ac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3':
  Metadata:
    encoder         : Lavf58.29.100
  Duration: 00:00:57.82, start: 0.000000, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_2.mp3':
  Metadata:
    TSSE            : Lavf61.7.100
  Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x74bc6eff0580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.258127%
size=     235KiB time=00:00:15.00 bitrate= 128.4kbits/s speed=2.73e+03x    
Created chunk 3/4: /tmp/audio_chunk_2.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
  configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[mp3 @ 0x7f7ef6e5aac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3':
  Metadata:
    encoder         : Lavf58.29.100
  Duration: 00:00:57.82, start: 0.000000, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_3.mp3':
  Metadata:
    TSSE            : Lavf61.7.100
  Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x7f7ef6e57580] video:0KiB audio:200KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.302356%
size=     201KiB time=00:00:12.81 bitrate= 128.4kbits/s speed=2.19e+03x    
Created chunk 4/4: /tmp/audio_chunk_3.mp3
Generating 4 video chunks sequentially...
Generating video for chunk 1/4...
Chunk 1/4 completed: /tmp/cog-runner-tmp-3902082295/c12738282ea44557/tmpeoirz5w7.mp4
Generating video for chunk 2/4...
Chunk 2/4 completed: /tmp/cog-runner-tmp-3902082295/2fd23432e3461a5a/tmps0w61jrn.mp4
Generating video for chunk 3/4...
Chunk 3/4 completed: /tmp/cog-runner-tmp-3902082295/fc3430f4df3d2a0b/tmprr4n_s3o.mp4
Generating video for chunk 4/4...
Chunk 4/4 completed: /tmp/cog-runner-tmp-3902082295/74977905a3a3e7fd/tmpktyd0jeq.mp4
Merging video chunks together...
Final video merged: /tmp/cog-runner-tmp-3902082295/ed7c93a37889273e/output.mp4
Adding captions to video...
Captions added: /tmp/cog-runner-tmp-3902082295/772f43d71fc8a9b1/output.mp4
Pipeline complete!
Version Details
Version ID
06fab6d64e7c733eaa41dd03ee117c4169b2a7f655d81851264f6762743dd93b
Version Created
October 26, 2025
Run on Replicate →