cjwbw/aniportrait-audio2vid 🔢🖼️ → ❓

▶️ 14.7K runs 📅 Apr 2024 ⚙️ Cog 0.9.4 🔗 GitHub 📄 Paper ⚖️ License
audio-to-video lipsync portrait-animation

About

Audio-Driven Synthesis of Photorealistic Portrait Animations

Example Output

Output

{"pose":"https://replicate.delivery/pbxt/nGTiFWLt6H6cFRrIBqx2DxYVCIJ5Nyd0Rebve9eCen5BNvXKB/pose.mp4","video":"https://replicate.delivery/pbxt/h79aTw41PmqzJZsW4egWAnE5siSAzHfFeCaSzFNAJ9NfMvXKB/out.mp4"}

Performance Metrics

501.80s Prediction Time
685.79s Total Time
All Input Parameters
{
  "fps": 30,
  "audio": "https://replicate.delivery/pbxt/KfVpX7wBikBZbAqVyur6eBPFPzTeDExcl12VGYEnJgvecHSU/lyl.wav",
  "image": "https://replicate.delivery/pbxt/KfVpX606yiO1dn0ZDR8LAPcFsMFBmynKD5IEXWy2CFZnmzel/lyl.png",
  "steps": 25,
  "width": 512,
  "height": 512,
  "guidance_scale": 3.5
}
Input Parameters
fps Type: integerDefault: 30
Frame per second in the output video
seed Type: integer
Random seed. Leave blank to randomize the seed
audio (required) Type: string
Input audio
image (required) Type: string
Input image
steps Type: integerDefault: 25
Inference steps
width Type: integerDefault: 512
Width of output video
height Type: integerDefault: 512
Height of output video
guidance_scale Type: numberDefault: 3.5
Scale for classifier-free guidance
Output Schema
Example Execution Logs
Using seed: 33894
pose video has 233 frames, with 30 fps
/src/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
num_channels_latents = self.denoising_unet.in_channels
  0%|          | 0/25 [00:00<?, ?it/s]
  4%|▍         | 1/25 [00:19<07:38, 19.12s/it]
  8%|▊         | 2/25 [00:38<07:17, 19.01s/it]
 12%|█▏        | 3/25 [00:57<06:58, 19.01s/it]
 16%|█▌        | 4/25 [01:16<06:39, 19.03s/it]
 20%|██        | 5/25 [01:35<06:20, 19.04s/it]
 24%|██▍       | 6/25 [01:54<06:02, 19.05s/it]
 28%|██▊       | 7/25 [02:13<05:43, 19.07s/it]
 32%|███▏      | 8/25 [02:32<05:24, 19.08s/it]
 36%|███▌      | 9/25 [02:51<05:05, 19.09s/it]
 40%|████      | 10/25 [03:10<04:46, 19.10s/it]
 44%|████▍     | 11/25 [03:29<04:27, 19.10s/it]
 48%|████▊     | 12/25 [03:48<04:08, 19.11s/it]
 52%|█████▏    | 13/25 [04:08<03:49, 19.11s/it]
 56%|█████▌    | 14/25 [04:27<03:30, 19.11s/it]
 60%|██████    | 15/25 [04:46<03:11, 19.11s/it]
 64%|██████▍   | 16/25 [05:05<02:52, 19.12s/it]
 68%|██████▊   | 17/25 [05:24<02:32, 19.12s/it]
 72%|███████▏  | 18/25 [05:43<02:13, 19.12s/it]
 76%|███████▌  | 19/25 [06:02<01:54, 19.12s/it]
 80%|████████  | 20/25 [06:21<01:35, 19.12s/it]
 84%|████████▍ | 21/25 [06:41<01:16, 19.12s/it]
 88%|████████▊ | 22/25 [07:00<00:57, 19.12s/it]
 92%|█████████▏| 23/25 [07:19<00:38, 19.12s/it]
 96%|█████████▌| 24/25 [07:38<00:19, 19.12s/it]
100%|██████████| 25/25 [07:57<00:00, 19.12s/it]
100%|██████████| 25/25 [07:57<00:00, 19.10s/it]
  0%|          | 0/233 [00:00<?, ?it/s]
  2%|▏         | 5/233 [00:00<00:05, 42.87it/s]
  4%|▍         | 10/233 [00:00<00:09, 23.92it/s]
  6%|▌         | 13/233 [00:00<00:10, 21.69it/s]
  7%|▋         | 16/233 [00:00<00:10, 20.45it/s]
  8%|▊         | 19/233 [00:00<00:10, 19.71it/s]
  9%|▉         | 22/233 [00:01<00:10, 19.23it/s]
 10%|█         | 24/233 [00:01<00:11, 18.99it/s]
 11%|█         | 26/233 [00:01<00:11, 18.81it/s]
 12%|█▏        | 28/233 [00:01<00:10, 18.67it/s]
 13%|█▎        | 30/233 [00:01<00:10, 18.56it/s]
 14%|█▎        | 32/233 [00:01<00:10, 18.48it/s]
 15%|█▍        | 34/233 [00:01<00:10, 18.43it/s]
 15%|█▌        | 36/233 [00:01<00:10, 18.38it/s]
 16%|█▋        | 38/233 [00:01<00:10, 18.35it/s]
 17%|█▋        | 40/233 [00:02<00:10, 18.33it/s]
 18%|█▊        | 42/233 [00:02<00:10, 18.31it/s]
 19%|█▉        | 44/233 [00:02<00:10, 18.30it/s]
 20%|█▉        | 46/233 [00:02<00:10, 18.29it/s]
 21%|██        | 48/233 [00:02<00:10, 18.28it/s]
 21%|██▏       | 50/233 [00:02<00:10, 18.28it/s]
 22%|██▏       | 52/233 [00:02<00:09, 18.27it/s]
 23%|██▎       | 54/233 [00:02<00:09, 18.28it/s]
 24%|██▍       | 56/233 [00:02<00:09, 18.28it/s]
 25%|██▍       | 58/233 [00:03<00:09, 18.27it/s]
 26%|██▌       | 60/233 [00:03<00:09, 18.27it/s]
 27%|██▋       | 62/233 [00:03<00:09, 18.26it/s]
 27%|██▋       | 64/233 [00:03<00:09, 18.25it/s]
 28%|██▊       | 66/233 [00:03<00:09, 18.25it/s]
 29%|██▉       | 68/233 [00:03<00:09, 18.26it/s]
 30%|███       | 70/233 [00:03<00:08, 18.26it/s]
 31%|███       | 72/233 [00:03<00:08, 18.25it/s]
 32%|███▏      | 74/233 [00:03<00:08, 18.25it/s]
 33%|███▎      | 76/233 [00:04<00:08, 18.25it/s]
 33%|███▎      | 78/233 [00:04<00:08, 18.26it/s]
 34%|███▍      | 80/233 [00:04<00:08, 18.26it/s]
 35%|███▌      | 82/233 [00:04<00:08, 18.27it/s]
 36%|███▌      | 84/233 [00:04<00:08, 18.26it/s]
 37%|███▋      | 86/233 [00:04<00:08, 18.27it/s]
 38%|███▊      | 88/233 [00:04<00:07, 18.26it/s]
 39%|███▊      | 90/233 [00:04<00:07, 18.26it/s]
 39%|███▉      | 92/233 [00:04<00:07, 18.25it/s]
 40%|████      | 94/233 [00:04<00:07, 18.23it/s]
 41%|████      | 96/233 [00:05<00:07, 18.23it/s]
 42%|████▏     | 98/233 [00:05<00:07, 18.24it/s]
 43%|████▎     | 100/233 [00:05<00:07, 18.26it/s]
 44%|████▍     | 102/233 [00:05<00:07, 18.26it/s]
 45%|████▍     | 104/233 [00:05<00:07, 18.27it/s]
 45%|████▌     | 106/233 [00:05<00:06, 18.27it/s]
 46%|████▋     | 108/233 [00:05<00:06, 18.26it/s]
 47%|████▋     | 110/233 [00:05<00:06, 18.25it/s]
 48%|████▊     | 112/233 [00:05<00:06, 18.26it/s]
 49%|████▉     | 114/233 [00:06<00:06, 18.24it/s]
 50%|████▉     | 116/233 [00:06<00:06, 18.25it/s]
 51%|█████     | 118/233 [00:06<00:06, 18.24it/s]
 52%|█████▏    | 120/233 [00:06<00:06, 18.24it/s]
 52%|█████▏    | 122/233 [00:06<00:06, 18.25it/s]
 53%|█████▎    | 124/233 [00:06<00:05, 18.27it/s]
 54%|█████▍    | 126/233 [00:06<00:05, 18.26it/s]
 55%|█████▍    | 128/233 [00:06<00:05, 18.27it/s]
 56%|█████▌    | 130/233 [00:06<00:05, 18.26it/s]
 57%|█████▋    | 132/233 [00:07<00:05, 18.27it/s]
 58%|█████▊    | 134/233 [00:07<00:05, 18.27it/s]
 58%|█████▊    | 136/233 [00:07<00:05, 18.27it/s]
 59%|█████▉    | 138/233 [00:07<00:05, 18.27it/s]
 60%|██████    | 140/233 [00:07<00:05, 18.26it/s]
 61%|██████    | 142/233 [00:07<00:04, 18.25it/s]
 62%|██████▏   | 144/233 [00:07<00:04, 18.25it/s]
 63%|██████▎   | 146/233 [00:07<00:04, 18.25it/s]
 64%|██████▎   | 148/233 [00:07<00:04, 18.26it/s]
 64%|██████▍   | 150/233 [00:08<00:04, 18.26it/s]
 65%|██████▌   | 152/233 [00:08<00:04, 18.26it/s]
 66%|██████▌   | 154/233 [00:08<00:04, 18.26it/s]
 67%|██████▋   | 156/233 [00:08<00:04, 18.27it/s]
 68%|██████▊   | 158/233 [00:08<00:04, 18.26it/s]
 69%|██████▊   | 160/233 [00:08<00:03, 18.26it/s]
 70%|██████▉   | 162/233 [00:08<00:03, 18.26it/s]
 70%|███████   | 164/233 [00:08<00:03, 18.26it/s]
 71%|███████   | 166/233 [00:08<00:03, 18.26it/s]
 72%|███████▏  | 168/233 [00:09<00:03, 18.25it/s]
 73%|███████▎  | 170/233 [00:09<00:03, 18.25it/s]
 74%|███████▍  | 172/233 [00:09<00:03, 18.25it/s]
 75%|███████▍  | 174/233 [00:09<00:03, 18.25it/s]
 76%|███████▌  | 176/233 [00:09<00:03, 18.25it/s]
 76%|███████▋  | 178/233 [00:09<00:03, 18.26it/s]
 77%|███████▋  | 180/233 [00:09<00:02, 18.26it/s]
 78%|███████▊  | 182/233 [00:09<00:02, 18.27it/s]
 79%|███████▉  | 184/233 [00:09<00:02, 18.27it/s]
 80%|███████▉  | 186/233 [00:10<00:02, 18.26it/s]
 81%|████████  | 188/233 [00:10<00:02, 18.27it/s]
 82%|████████▏ | 190/233 [00:10<00:02, 18.25it/s]
 82%|████████▏ | 192/233 [00:10<00:02, 18.26it/s]
 83%|████████▎ | 194/233 [00:10<00:02, 18.27it/s]
 84%|████████▍ | 196/233 [00:10<00:02, 18.27it/s]
 85%|████████▍ | 198/233 [00:10<00:01, 18.27it/s]
 86%|████████▌ | 200/233 [00:10<00:01, 18.26it/s]
 87%|████████▋ | 202/233 [00:10<00:01, 18.26it/s]
 88%|████████▊ | 204/233 [00:11<00:01, 18.24it/s]
 88%|████████▊ | 206/233 [00:11<00:01, 18.23it/s]
 89%|████████▉ | 208/233 [00:11<00:01, 18.24it/s]
 90%|█████████ | 210/233 [00:11<00:01, 18.25it/s]
 91%|█████████ | 212/233 [00:11<00:01, 18.25it/s]
 92%|█████████▏| 214/233 [00:11<00:01, 18.24it/s]
 93%|█████████▎| 216/233 [00:11<00:00, 18.24it/s]
 94%|█████████▎| 218/233 [00:11<00:00, 18.25it/s]
 94%|█████████▍| 220/233 [00:11<00:00, 18.26it/s]
 95%|█████████▌| 222/233 [00:12<00:00, 18.25it/s]
 96%|█████████▌| 224/233 [00:12<00:00, 18.26it/s]
 97%|█████████▋| 226/233 [00:12<00:00, 18.26it/s]
 98%|█████████▊| 228/233 [00:12<00:00, 18.27it/s]
 99%|█████████▊| 230/233 [00:12<00:00, 18.27it/s]
100%|█████████▉| 232/233 [00:12<00:00, 18.27it/s]
100%|██████████| 233/233 [00:12<00:00, 18.49it/s]
Version Details
Version ID
3f976d8f2308f5c676a484e873f7d1ac09763f789fa211894df1ed96d3d17cb2
Version Created
April 1, 2024
Run on Replicate →