lagune870601/sonic_cc 🔢🖼️✓ → 🖼️
About
sonic_cc
Example Output
Output
Performance Metrics
396.41s
Prediction Time
456.85s
Total Time
All Input Parameters
{ "audio": "https://replicate.delivery/pbxt/Nm49saZpPThIv9WRi6Z67k0oXBNoyS3Yw3V3pAufR2EqC2QF/1_song_audio.mp3", "image": "https://replicate.delivery/pbxt/Nm49sEWxE9ttqF3uVRWc5HUDHxMBqGiknEBCKJGh56MadrKz/c3e9ece9-eccf-4887-a45b-0e4d6ac7015d.jpeg", "crop_image": false, "dynamic_scale": 1, "min_resolution": 512, "inference_steps": 25, "keep_resolution": false }
Input Parameters
- seed
- Random seed for reproducible results. Leave blank for a random seed.
- audio (required)
- Input audio file (WAV, MP3, etc.) for the voice.
- image (required)
- Input portrait image (will be cropped if face is detected).
- crop_image
- If true, cut image and leave header only
- dynamic_scale
- Controls movement intensity. Increase/decrease for more/less movement.
- min_resolution
- Minimum image resolution for processing. Lower values use less memory but may reduce quality.
- inference_steps
- Number of diffusion steps. Higher values may improve quality but take longer.
- keep_resolution
- If true, output video matches the original image resolution. Otherwise uses the min_resolution after cropping.
Output Schema
Output
Example Execution Logs
Starting prediction... Saved input image to: /src/tmp_path/input_image.png Converted and saved audio to: /src/tmp_path/input_audio.wav Preprocessing image... /root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Face detection result: 1 face(s) found Using original image for processing (no face detected) Generating talking face animation... 0%| | 0/186 [00:00<?, ?it/s] 87%|████████▋ | 161/186 [00:00<00:00, 1609.53it/s] 100%|██████████| 186/186 [00:00<00:00, 1630.35it/s] 0%| | 0/25 [00:00<?, ?it/s] 4%|▍ | 1/25 [00:12<05:07, 12.81s/it] 8%|▊ | 2/25 [00:26<05:05, 13.26s/it] 12%|█▏ | 3/25 [00:40<04:55, 13.43s/it] 16%|█▌ | 4/25 [00:53<04:43, 13.51s/it] 20%|██ | 5/25 [01:07<04:31, 13.56s/it] 24%|██▍ | 6/25 [01:20<04:18, 13.58s/it] 28%|██▊ | 7/25 [01:34<04:04, 13.60s/it] 32%|███▏ | 8/25 [01:48<03:51, 13.62s/it] 36%|███▌ | 9/25 [02:01<03:38, 13.63s/it] 40%|████ | 10/25 [02:15<03:24, 13.64s/it] 44%|████▍ | 11/25 [02:29<03:11, 13.65s/it] 48%|████▊ | 12/25 [02:42<02:57, 13.65s/it] 52%|█████▏ | 13/25 [02:56<02:43, 13.65s/it] 56%|█████▌ | 14/25 [03:10<02:30, 13.66s/it] 60%|██████ | 15/25 [03:23<02:16, 13.66s/it] 64%|██████▍ | 16/25 [03:37<02:02, 13.65s/it] 68%|██████▊ | 17/25 [03:51<01:49, 13.65s/it] 72%|███████▏ | 18/25 [04:04<01:35, 13.66s/it] 76%|███████▌ | 19/25 [04:18<01:21, 13.66s/it] 80%|████████ | 20/25 [04:32<01:08, 13.65s/it] 84%|████████▍ | 21/25 [04:45<00:54, 13.65s/it] 88%|████████▊ | 22/25 [04:59<00:40, 13.65s/it] 92%|█████████▏| 23/25 [05:13<00:27, 13.65s/it] 96%|█████████▌| 24/25 [05:26<00:13, 13.65s/it] 100%|██████████| 25/25 [05:40<00:00, 13.64s/it] 100%|██████████| 25/25 [05:40<00:00, 13.61s/it] 0% 0/185 [00:00<?, ?it/s] 4% 7/185 [00:00<00:02, 63.60it/s] 9% 16/185 [00:00<00:02, 77.96it/s] 14% 25/185 [00:00<00:01, 82.84it/s] 18% 34/185 [00:00<00:01, 85.22it/s] 23% 43/185 [00:00<00:01, 86.55it/s] 28% 52/185 [00:00<00:01, 87.36it/s] 33% 61/185 [00:00<00:01, 87.91it/s] 38% 70/185 [00:00<00:01, 88.23it/s] 43% 79/185 [00:00<00:01, 88.45it/s] 48% 88/185 [00:01<00:01, 88.58it/s] 52% 97/185 [00:01<00:00, 88.71it/s] 57% 106/185 [00:01<00:00, 88.78it/s] 62% 115/185 [00:01<00:00, 88.81it/s] 67% 124/185 [00:01<00:00, 88.84it/s] 72% 133/185 [00:01<00:00, 88.87it/s] 77% 142/185 [00:01<00:00, 88.89it/s] 82% 151/185 [00:01<00:00, 88.92it/s] 86% 160/185 [00:01<00:00, 88.92it/s] 91% 169/185 [00:01<00:00, 88.91it/s] 96% 178/185 [00:02<00:00, 88.88it/s] 100% 185/185 [00:02<00:00, 87.59it/s] ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/src/res_path/output_noaudio.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.29.100 Duration: 00:00:14.84, start: 0.000000, bitrate: 1048 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 512x896, 1045 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] Guessed Channel Layout for Input Stream #1.0 : stereo Input #1, wav, from '/src/tmp_path/input_audio.wav': Duration: 00:00:14.96, bitrate: 1411 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264)) Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native)) Press [q] to stop, [?] for help [libx264 @ 0x5e6466012380] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512 [libx264 @ 0x5e6466012380] profile High, level 3.1, 4:2:0, 8-bit [libx264 @ 0x5e6466012380] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=18.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to '/src/res_path/output.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.76.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 512x896, q=2-31, 25 fps, 12800 tbn (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] encoder : Lavc58.134.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s Metadata: encoder : Lavc58.134.100 aac frame= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x frame= 158 fps=0.0 q=23.0 size= 768kB time=00:00:03.84 bitrate=1638.5kbits/s speed=7.38x frame= 284 fps=275 q=23.0 size= 1536kB time=00:00:08.88 bitrate=1417.0kbits/s speed=8.59x frame= 371 fps=243 q=-1.0 Lsize= 2834kB time=00:00:14.83 bitrate=1564.8kbits/s speed=9.72x video:2587kB audio:234kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.479172% [libx264 @ 0x5e6466012380] frame I:2 Avg QP:15.30 size: 76896 [libx264 @ 0x5e6466012380] frame P:100 Avg QP:17.22 size: 19413 [libx264 @ 0x5e6466012380] frame B:269 Avg QP:23.10 size: 2057 [libx264 @ 0x5e6466012380] consecutive B-frames: 1.1% 5.9% 2.4% 90.6% [libx264 @ 0x5e6466012380] mb I I16..4: 1.6% 64.5% 33.8% [libx264 @ 0x5e6466012380] mb P I16..4: 0.0% 0.7% 0.4% P16..4: 31.4% 28.0% 20.0% 0.0% 0.0% skip:19.4% [libx264 @ 0x5e6466012380] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 35.5% 4.3% 1.1% direct: 1.1% skip:58.0% L0:35.6% L1:53.4% BI:11.0% [libx264 @ 0x5e6466012380] 8x8 transform intra:63.6% inter:59.4% [libx264 @ 0x5e6466012380] coded y,uvDC,uvAC intra: 93.5% 93.0% 69.6% inter: 15.9% 13.3% 0.6% [libx264 @ 0x5e6466012380] i16 v,h,dc,p: 7% 17% 23% 53% [libx264 @ 0x5e6466012380] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 12% 10% 7% 10% 11% 9% 14% 9% [libx264 @ 0x5e6466012380] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 17% 11% 7% 6% 17% 13% 12% 10% 7% [libx264 @ 0x5e6466012380] i8c dc,h,v,p: 41% 21% 24% 14% [libx264 @ 0x5e6466012380] Weighted P-Frames: Y:13.0% UV:6.0% [libx264 @ 0x5e6466012380] ref P L0: 61.9% 21.7% 13.1% 3.0% 0.3% [libx264 @ 0x5e6466012380] ref B L0: 94.9% 4.4% 0.7% [libx264 @ 0x5e6466012380] ref B L1: 97.8% 2.2% [libx264 @ 0x5e6466012380] kb/s:1427.75 [aac @ 0x5e6466014240] Qavg: 186.558 Video generation complete Total prediction time: 395.92 seconds
Version Details
- Version ID
a8b068b27f183677c4dbd0f58574cbc07bb7f496cd1b3750bcd37586f1421abd
- Version Created
- June 7, 2025