avocado/podcast-clip-generator 📝 → ❓
About
Generate a short clip of an AI character speaking on a podcast
Example Output
Output
{"audio":"https://replicate.delivery/xezq/AEwx1lIif0X1GSETMJRXiCPVPI3bNeOGlr5YNfNfbryCGQPWB/tmpi8eoz5ic.mp3","image":"https://replicate.delivery/xezq/reHZofRArjjc9kGBEcw1Ov7xL0rS5ZWOdrAME9VtHd5gB0jVA/out-0.webp","video":"https://replicate.delivery/xezq/G7odbwSf8n1UKiSOvt9Qbizie6nJLrn8psHO58f8zOqADoHrA/output.mp4","script":"Hi everyone! So today I wanna tell you how to run a restaurant because I figured it out at the playground yesterday. First, you gotta have a really long line like the slide, and make everyone wait their turn even if they're crying. That's how you know it's popular! Then, just like when Tommy hogged all the sandbox toys, you should only give people one fork and make them share. It saves money! Oh, and the most important thing - if someone asks for something you don't have, just throw sand at them and run away. That's what I do when kids want my juice box. Also, make sure to close randomly for nap time because grown-ups need naps too, even if people are still hungry. Trust me, I'm three and I know these things. The playground taught me that if you're loud enough and run around a lot, people will pay attention to you, and that's basically what restaurants do, right? Anyway, that's my business advice. Now I gotta go eat some goldfish crackers. Bye!"}
Performance Metrics
1410.95s
Prediction Time
1411.35s
Total Time
All Input Parameters
{
"voice_id": "Deep_Voice_Man",
"podcast_style": "professional studio podcast with a single host speaking into a microphone, modern setup, warm lighting, restaurant background",
"podcaster_prompt": "You are a toddler giving questionable advice on how to run a restaurants based on an experience at a playground"
}
Input Parameters
- voice_id
- Voice ID for the podcast host. Options: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
- podcast_style
- Description of the podcast setting/style for image generation
- podcaster_prompt (required)
- prompt for the podcaster to generate an anecdote about
Output Schema
- audio
- Audio
- image
- Image
- video
- Video
- script
- Script
Example Execution Logs
Generating podcast script...
Script generated (958 characters)
Generating speech audio...
Audio generated: /tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3
Generating podcast host image...
Image generated: /tmp/cog-runner-tmp-3902082295/646e6ed94550071c/out-0.webp
Getting audio duration...
Audio duration: 57.82 seconds
Splitting audio into 4 chunks...
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mp3 @ 0x76f647ad4ac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:57.82, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_0.mp3':
Metadata:
TSSE : Lavf61.7.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x76f647ad1580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.258127%
size= 235KiB time=00:00:15.01 bitrate= 128.3kbits/s speed=1.11e+03x
Created chunk 1/4: /tmp/audio_chunk_0.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mp3 @ 0x78159a20dac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:57.82, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_1.mp3':
Metadata:
TSSE : Lavf61.7.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x78159a20a580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.257509%
size= 236KiB time=00:00:15.02 bitrate= 128.5kbits/s speed=1.27e+03x
Created chunk 2/4: /tmp/audio_chunk_1.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mp3 @ 0x74bc6eff3ac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:57.82, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_2.mp3':
Metadata:
TSSE : Lavf61.7.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x74bc6eff0580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.258127%
size= 235KiB time=00:00:15.00 bitrate= 128.4kbits/s speed=2.73e+03x
Created chunk 3/4: /tmp/audio_chunk_2.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mp3 @ 0x7f7ef6e5aac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3902082295/be46ddeca6eb8615/tmpi8eoz5ic.mp3':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:57.82, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_3.mp3':
Metadata:
TSSE : Lavf61.7.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x7f7ef6e57580] video:0KiB audio:200KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.302356%
size= 201KiB time=00:00:12.81 bitrate= 128.4kbits/s speed=2.19e+03x
Created chunk 4/4: /tmp/audio_chunk_3.mp3
Generating 4 video chunks sequentially...
Generating video for chunk 1/4...
Chunk 1/4 completed: /tmp/cog-runner-tmp-3902082295/c12738282ea44557/tmpeoirz5w7.mp4
Generating video for chunk 2/4...
Chunk 2/4 completed: /tmp/cog-runner-tmp-3902082295/2fd23432e3461a5a/tmps0w61jrn.mp4
Generating video for chunk 3/4...
Chunk 3/4 completed: /tmp/cog-runner-tmp-3902082295/fc3430f4df3d2a0b/tmprr4n_s3o.mp4
Generating video for chunk 4/4...
Chunk 4/4 completed: /tmp/cog-runner-tmp-3902082295/74977905a3a3e7fd/tmpktyd0jeq.mp4
Merging video chunks together...
Final video merged: /tmp/cog-runner-tmp-3902082295/ed7c93a37889273e/output.mp4
Adding captions to video...
Captions added: /tmp/cog-runner-tmp-3902082295/772f43d71fc8a9b1/output.mp4
Pipeline complete!
Version Details
- Version ID
06fab6d64e7c733eaa41dd03ee117c4169b2a7f655d81851264f6762743dd93b- Version Created
- October 26, 2025