cjwbw/aniportrait-audio2vid 🔢🖼️ → ❓
About
Audio-Driven Synthesis of Photorealistic Portrait Animations

Example Output
Output
{"pose":"https://replicate.delivery/pbxt/nGTiFWLt6H6cFRrIBqx2DxYVCIJ5Nyd0Rebve9eCen5BNvXKB/pose.mp4","video":"https://replicate.delivery/pbxt/h79aTw41PmqzJZsW4egWAnE5siSAzHfFeCaSzFNAJ9NfMvXKB/out.mp4"}
Performance Metrics
501.80s
Prediction Time
685.79s
Total Time
All Input Parameters
{ "fps": 30, "audio": "https://replicate.delivery/pbxt/KfVpX7wBikBZbAqVyur6eBPFPzTeDExcl12VGYEnJgvecHSU/lyl.wav", "image": "https://replicate.delivery/pbxt/KfVpX606yiO1dn0ZDR8LAPcFsMFBmynKD5IEXWy2CFZnmzel/lyl.png", "steps": 25, "width": 512, "height": 512, "guidance_scale": 3.5 }
Input Parameters
- fps
- Frame per second in the output video
- seed
- Random seed. Leave blank to randomize the seed
- audio (required)
- Input audio
- image (required)
- Input image
- steps
- Inference steps
- width
- Width of output video
- height
- Height of output video
- guidance_scale
- Scale for classifier-free guidance
Output Schema
Example Execution Logs
Using seed: 33894 pose video has 233 frames, with 30 fps /src/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'. num_channels_latents = self.denoising_unet.in_channels 0%| | 0/25 [00:00<?, ?it/s] 4%|▍ | 1/25 [00:19<07:38, 19.12s/it] 8%|▊ | 2/25 [00:38<07:17, 19.01s/it] 12%|█▏ | 3/25 [00:57<06:58, 19.01s/it] 16%|█▌ | 4/25 [01:16<06:39, 19.03s/it] 20%|██ | 5/25 [01:35<06:20, 19.04s/it] 24%|██▍ | 6/25 [01:54<06:02, 19.05s/it] 28%|██▊ | 7/25 [02:13<05:43, 19.07s/it] 32%|███▏ | 8/25 [02:32<05:24, 19.08s/it] 36%|███▌ | 9/25 [02:51<05:05, 19.09s/it] 40%|████ | 10/25 [03:10<04:46, 19.10s/it] 44%|████▍ | 11/25 [03:29<04:27, 19.10s/it] 48%|████▊ | 12/25 [03:48<04:08, 19.11s/it] 52%|█████▏ | 13/25 [04:08<03:49, 19.11s/it] 56%|█████▌ | 14/25 [04:27<03:30, 19.11s/it] 60%|██████ | 15/25 [04:46<03:11, 19.11s/it] 64%|██████▍ | 16/25 [05:05<02:52, 19.12s/it] 68%|██████▊ | 17/25 [05:24<02:32, 19.12s/it] 72%|███████▏ | 18/25 [05:43<02:13, 19.12s/it] 76%|███████▌ | 19/25 [06:02<01:54, 19.12s/it] 80%|████████ | 20/25 [06:21<01:35, 19.12s/it] 84%|████████▍ | 21/25 [06:41<01:16, 19.12s/it] 88%|████████▊ | 22/25 [07:00<00:57, 19.12s/it] 92%|█████████▏| 23/25 [07:19<00:38, 19.12s/it] 96%|█████████▌| 24/25 [07:38<00:19, 19.12s/it] 100%|██████████| 25/25 [07:57<00:00, 19.12s/it] 100%|██████████| 25/25 [07:57<00:00, 19.10s/it] 0%| | 0/233 [00:00<?, ?it/s] 2%|▏ | 5/233 [00:00<00:05, 42.87it/s] 4%|▍ | 10/233 [00:00<00:09, 23.92it/s] 6%|▌ | 13/233 [00:00<00:10, 21.69it/s] 7%|▋ | 16/233 [00:00<00:10, 20.45it/s] 8%|▊ | 19/233 [00:00<00:10, 19.71it/s] 9%|▉ | 22/233 [00:01<00:10, 19.23it/s] 10%|█ | 24/233 [00:01<00:11, 18.99it/s] 11%|█ | 26/233 [00:01<00:11, 18.81it/s] 12%|█▏ | 28/233 [00:01<00:10, 18.67it/s] 13%|█▎ | 30/233 [00:01<00:10, 18.56it/s] 14%|█▎ | 32/233 [00:01<00:10, 18.48it/s] 15%|█▍ | 34/233 [00:01<00:10, 18.43it/s] 15%|█▌ | 36/233 [00:01<00:10, 18.38it/s] 16%|█▋ | 38/233 [00:01<00:10, 18.35it/s] 17%|█▋ | 40/233 [00:02<00:10, 18.33it/s] 18%|█▊ | 42/233 [00:02<00:10, 18.31it/s] 19%|█▉ | 44/233 [00:02<00:10, 18.30it/s] 20%|█▉ | 46/233 [00:02<00:10, 18.29it/s] 21%|██ | 48/233 [00:02<00:10, 18.28it/s] 21%|██▏ | 50/233 [00:02<00:10, 18.28it/s] 22%|██▏ | 52/233 [00:02<00:09, 18.27it/s] 23%|██▎ | 54/233 [00:02<00:09, 18.28it/s] 24%|██▍ | 56/233 [00:02<00:09, 18.28it/s] 25%|██▍ | 58/233 [00:03<00:09, 18.27it/s] 26%|██▌ | 60/233 [00:03<00:09, 18.27it/s] 27%|██▋ | 62/233 [00:03<00:09, 18.26it/s] 27%|██▋ | 64/233 [00:03<00:09, 18.25it/s] 28%|██▊ | 66/233 [00:03<00:09, 18.25it/s] 29%|██▉ | 68/233 [00:03<00:09, 18.26it/s] 30%|███ | 70/233 [00:03<00:08, 18.26it/s] 31%|███ | 72/233 [00:03<00:08, 18.25it/s] 32%|███▏ | 74/233 [00:03<00:08, 18.25it/s] 33%|███▎ | 76/233 [00:04<00:08, 18.25it/s] 33%|███▎ | 78/233 [00:04<00:08, 18.26it/s] 34%|███▍ | 80/233 [00:04<00:08, 18.26it/s] 35%|███▌ | 82/233 [00:04<00:08, 18.27it/s] 36%|███▌ | 84/233 [00:04<00:08, 18.26it/s] 37%|███▋ | 86/233 [00:04<00:08, 18.27it/s] 38%|███▊ | 88/233 [00:04<00:07, 18.26it/s] 39%|███▊ | 90/233 [00:04<00:07, 18.26it/s] 39%|███▉ | 92/233 [00:04<00:07, 18.25it/s] 40%|████ | 94/233 [00:04<00:07, 18.23it/s] 41%|████ | 96/233 [00:05<00:07, 18.23it/s] 42%|████▏ | 98/233 [00:05<00:07, 18.24it/s] 43%|████▎ | 100/233 [00:05<00:07, 18.26it/s] 44%|████▍ | 102/233 [00:05<00:07, 18.26it/s] 45%|████▍ | 104/233 [00:05<00:07, 18.27it/s] 45%|████▌ | 106/233 [00:05<00:06, 18.27it/s] 46%|████▋ | 108/233 [00:05<00:06, 18.26it/s] 47%|████▋ | 110/233 [00:05<00:06, 18.25it/s] 48%|████▊ | 112/233 [00:05<00:06, 18.26it/s] 49%|████▉ | 114/233 [00:06<00:06, 18.24it/s] 50%|████▉ | 116/233 [00:06<00:06, 18.25it/s] 51%|█████ | 118/233 [00:06<00:06, 18.24it/s] 52%|█████▏ | 120/233 [00:06<00:06, 18.24it/s] 52%|█████▏ | 122/233 [00:06<00:06, 18.25it/s] 53%|█████▎ | 124/233 [00:06<00:05, 18.27it/s] 54%|█████▍ | 126/233 [00:06<00:05, 18.26it/s] 55%|█████▍ | 128/233 [00:06<00:05, 18.27it/s] 56%|█████▌ | 130/233 [00:06<00:05, 18.26it/s] 57%|█████▋ | 132/233 [00:07<00:05, 18.27it/s] 58%|█████▊ | 134/233 [00:07<00:05, 18.27it/s] 58%|█████▊ | 136/233 [00:07<00:05, 18.27it/s] 59%|█████▉ | 138/233 [00:07<00:05, 18.27it/s] 60%|██████ | 140/233 [00:07<00:05, 18.26it/s] 61%|██████ | 142/233 [00:07<00:04, 18.25it/s] 62%|██████▏ | 144/233 [00:07<00:04, 18.25it/s] 63%|██████▎ | 146/233 [00:07<00:04, 18.25it/s] 64%|██████▎ | 148/233 [00:07<00:04, 18.26it/s] 64%|██████▍ | 150/233 [00:08<00:04, 18.26it/s] 65%|██████▌ | 152/233 [00:08<00:04, 18.26it/s] 66%|██████▌ | 154/233 [00:08<00:04, 18.26it/s] 67%|██████▋ | 156/233 [00:08<00:04, 18.27it/s] 68%|██████▊ | 158/233 [00:08<00:04, 18.26it/s] 69%|██████▊ | 160/233 [00:08<00:03, 18.26it/s] 70%|██████▉ | 162/233 [00:08<00:03, 18.26it/s] 70%|███████ | 164/233 [00:08<00:03, 18.26it/s] 71%|███████ | 166/233 [00:08<00:03, 18.26it/s] 72%|███████▏ | 168/233 [00:09<00:03, 18.25it/s] 73%|███████▎ | 170/233 [00:09<00:03, 18.25it/s] 74%|███████▍ | 172/233 [00:09<00:03, 18.25it/s] 75%|███████▍ | 174/233 [00:09<00:03, 18.25it/s] 76%|███████▌ | 176/233 [00:09<00:03, 18.25it/s] 76%|███████▋ | 178/233 [00:09<00:03, 18.26it/s] 77%|███████▋ | 180/233 [00:09<00:02, 18.26it/s] 78%|███████▊ | 182/233 [00:09<00:02, 18.27it/s] 79%|███████▉ | 184/233 [00:09<00:02, 18.27it/s] 80%|███████▉ | 186/233 [00:10<00:02, 18.26it/s] 81%|████████ | 188/233 [00:10<00:02, 18.27it/s] 82%|████████▏ | 190/233 [00:10<00:02, 18.25it/s] 82%|████████▏ | 192/233 [00:10<00:02, 18.26it/s] 83%|████████▎ | 194/233 [00:10<00:02, 18.27it/s] 84%|████████▍ | 196/233 [00:10<00:02, 18.27it/s] 85%|████████▍ | 198/233 [00:10<00:01, 18.27it/s] 86%|████████▌ | 200/233 [00:10<00:01, 18.26it/s] 87%|████████▋ | 202/233 [00:10<00:01, 18.26it/s] 88%|████████▊ | 204/233 [00:11<00:01, 18.24it/s] 88%|████████▊ | 206/233 [00:11<00:01, 18.23it/s] 89%|████████▉ | 208/233 [00:11<00:01, 18.24it/s] 90%|█████████ | 210/233 [00:11<00:01, 18.25it/s] 91%|█████████ | 212/233 [00:11<00:01, 18.25it/s] 92%|█████████▏| 214/233 [00:11<00:01, 18.24it/s] 93%|█████████▎| 216/233 [00:11<00:00, 18.24it/s] 94%|█████████▎| 218/233 [00:11<00:00, 18.25it/s] 94%|█████████▍| 220/233 [00:11<00:00, 18.26it/s] 95%|█████████▌| 222/233 [00:12<00:00, 18.25it/s] 96%|█████████▌| 224/233 [00:12<00:00, 18.26it/s] 97%|█████████▋| 226/233 [00:12<00:00, 18.26it/s] 98%|█████████▊| 228/233 [00:12<00:00, 18.27it/s] 99%|█████████▊| 230/233 [00:12<00:00, 18.27it/s] 100%|█████████▉| 232/233 [00:12<00:00, 18.27it/s] 100%|██████████| 233/233 [00:12<00:00, 18.49it/s]
Version Details
- Version ID
3f976d8f2308f5c676a484e873f7d1ac09763f789fa211894df1ed96d3d17cb2
- Version Created
- April 1, 2024