avocado/podcast-generator-wan 📝 → ❓
About
Forked from avocado/podcast-generator
Example Output
Output
{"audio":"https://replicate.delivery/xezq/vWaeZAj304TWAiwft6HxBB80xgDFUN8YvQeeBd1NVUr7vFLWB/tmp6kbjz4_9.mp3","image":"https://replicate.delivery/xezq/Og0Ei5m5epSZGCYfe9hoYjuV4bDXCLIDhXfQyOLgjP45vFLWB/out-0.webp","video":"https://replicate.delivery/xezq/Y1uSOyThPyJNMZuYSQVsmN48ZPNWgfpxps93KTtZ1RAfbxiVA/output.mp4","script":"So yesterday I was at the playground, right? And I learned something super important that I think all grown-ups need to know. I was on the big twisty slide, the really tall one that makes your tummy feel funny, when this kid Tyler started crying because he was scared to go down.
And you know what I did? I told him the secret. You just gotta close your eyes, hold your breath, and think about cookies the whole way down. Works every time! Tyler tried it and guess what happened? He went flying down that slide laughing his head off.
Now here's the thing - I think this works for everything scary that grown-ups do too. Like when my mom has to talk to mean people on the phone, or when dad has to fix the scary noise the car makes. Just close your eyes, hold your breath, and think about cookies.
I mean, it got Tyler down the slide, and last week it helped me eat my vegetables, so basically I'm pretty sure I've figured out life. You're welcome, everybody."}
Performance Metrics
All Input Parameters
{
"voice_id": "Casual_Guy",
"story_prompt": "a toddler giving questionable advice based on an experience at a playground",
"podcast_style": "a boy toddler professional studio podcast with a single host speaking into a microphone, modern setup, warm lighting"
}
Input Parameters
- voice_id
- Voice ID for the podcast host. Options: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
- story_prompt (required)
- Story prompt to generate a podcast about
- podcast_style
- Description of the podcast setting/style for image generation
Output Schema
- audio
- Audio
- image
- Image
- video
- Video
- script
- Script
Example Execution Logs
Generating podcast script...
Script generated (963 characters)
Generating speech audio...
Audio generated: /tmp/cog-runner-tmp-3494772783/2636c3b8baac5a52/tmp6kbjz4_9.mp3
Generating podcast host image...
Image generated: /tmp/cog-runner-tmp-3494772783/bf48b9294ee1f048/out-0.webp
Getting audio duration...
Audio duration: 52.67 seconds
Splitting audio into 4 chunks...
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mp3 @ 0x796df42abac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3494772783/2636c3b8baac5a52/tmp6kbjz4_9.mp3':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:52.67, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_0.mp3':
Metadata:
TSSE : Lavf61.7.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x796df42a8580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.258127%
size= 235KiB time=00:00:15.01 bitrate= 128.3kbits/s speed=2.24e+03x
Created chunk 1/4: /tmp/audio_chunk_0.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mp3 @ 0x7d9c15d3aac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3494772783/2636c3b8baac5a52/tmp6kbjz4_9.mp3':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:52.67, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_1.mp3':
Metadata:
TSSE : Lavf61.7.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x7d9c15d37580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.257509%
size= 236KiB time=00:00:15.02 bitrate= 128.5kbits/s speed=2.24e+03x
Created chunk 2/4: /tmp/audio_chunk_1.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mp3 @ 0x710708f4cac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3494772783/2636c3b8baac5a52/tmp6kbjz4_9.mp3':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:52.67, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_2.mp3':
Metadata:
TSSE : Lavf61.7.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x710708f49580] video:0KiB audio:235KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.258127%
size= 235KiB time=00:00:15.00 bitrate= 128.4kbits/s speed=1.92e+03x
Created chunk 3/4: /tmp/audio_chunk_2.mp3
ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
configuration: --pkg-config-flags=--static --extra-cflags=-fopenmp --extra-ldflags='-fopenmp -Wl,--allow-multiple-definition -Wl,-z,stack-size=2097152' --toolchain=hardened --disable-debug --disable-shared --disable-ffplay --enable-static --enable-gpl --enable-version3 --enable-fontconfig --enable-gray --enable-iconv --enable-lcms2 --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libdav1d --enable-libdavs2 --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libjxl --enable-libkvazaar --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librabbitmq --enable-librav1e --enable-librsvg --enable-librtmp --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libuavs3d --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxevd --enable-libxeve --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-openssl
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
[mp3 @ 0x7cdc5efebac0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/cog-runner-tmp-3494772783/2636c3b8baac5a52/tmp6kbjz4_9.mp3':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:52.67, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 32000 Hz, mono, fltp, 128 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Output #0, mp3, to '/tmp/audio_chunk_3.mp3':
Metadata:
TSSE : Lavf61.7.100
Stream #0:0: Audio: mp3, 32000 Hz, mono, fltp, 128 kb/s
Press [q] to stop, [?] for help
[out#0/mp3 @ 0x7cdc5efe8580] video:0KiB audio:120KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.505347%
size= 120KiB time=00:00:07.66 bitrate= 128.6kbits/s speed=3.14e+03x
Created chunk 4/4: /tmp/audio_chunk_3.mp3
Starting 4 video generation jobs in parallel...
Starting video generation for chunk 1/4...
Chunk 1/4 job started
Starting video generation for chunk 2/4...
Chunk 2/4 job started
Starting video generation for chunk 3/4...
Chunk 3/4 job started
Starting video generation for chunk 4/4...
Chunk 4/4 job started
Waiting for all 4 videos to complete...
Waiting for chunk 1/4...
Chunk 1/4 completed: /tmp/cog-runner-tmp-3494772783/1a1a665addce1a35/output.mp4
Waiting for chunk 2/4...
Chunk 2/4 completed: /tmp/cog-runner-tmp-3494772783/fd4891c5b91e728d/output.mp4
Waiting for chunk 3/4...
Chunk 3/4 completed: /tmp/cog-runner-tmp-3494772783/24ece0ea582576c5/output.mp4
Waiting for chunk 4/4...
Chunk 4/4 completed: /tmp/cog-runner-tmp-3494772783/6bd8919b98f40a43/output.mp4
Merging video chunks together...
Final video merged: /tmp/cog-runner-tmp-3494772783/6f484fc5965778a1/output.mp4
Adding captions to video...
Captions added: /tmp/cog-runner-tmp-3494772783/f2a7f2a728db9acc/output.mp4
Pipeline complete!
Version Details
- Version ID
7e8ed72bb89f952c84f0ea66aa0711fb698fce2b0821526db1cde91719bfe05a- Version Created
- October 25, 2025