tiger-ai-lab/anyv2v 🔢🖼️📝 → 🖼️
About
Tuning-free framework to achieve high appearance and temporal consistency in video editing

Example Output
Output
Performance Metrics
65.41s
Prediction Time
105.81s
Total Time
All Input Parameters
{ "video": "https://replicate.delivery/pbxt/KcsKIflCcgFseI734HsfUIPHr4gBir2RTKoaFs73qGIB8qeo/test.mp4", "pnp_f_t": 1, "editing_prompt": "a man doing exercises for the body and mind", "guidance_scale": 9, "pnp_temp_attn_t": 1, "pnp_spatial_attn_t": 1, "num_inference_steps": 50, "ddim_inversion_steps": 100, "ddim_init_latents_t_idx": 0, "editing_negative_prompt": "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms", "instruct_pix2pix_prompt": "turn man into robot" }
Input Parameters
- seed
- Random seed. Leave blank to randomize the seed
- video (required)
- Input video
- pnp_f_t
- Specifies the proportion of time steps in the DDIM sampling process where the convolutional injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step
- editing_prompt
- Describe the input video
- guidance_scale
- Scale for classifier-free guidance
- pnp_temp_attn_t
- Specifies the proportion of time steps in the DDIM sampling process where the temporal attention injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step
- edited_first_frame
- Provide the edited first frame of the input video. This is optional, leave it blank and provide the prompt below to use the default pipeline that edits the frist frame with instructpix2pix
- pnp_spatial_attn_t
- Specifies the proportion of time steps in the DDIM sampling process where the spatial attention injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step
- num_inference_steps
- Number of denoising steps
- ddim_inversion_steps
- Number of ddim inversion steps
- ddim_init_latents_t_idx
- This parameter determines the time step index at which to begin sampling from the initial DDIM inversed latents, with a range of [0, num_inference_steps-1]. In the context of a DDIM sampling process where the sampling step is 50, the scheduler progresses through the time steps in the sequence [981, 961, 941, ..., 1]. Therefore, setting ddim_init_latents_t_idx to 0 initiates the sampling from t=981, whereas setting it to 1 starts the process at t=961. A higher index enhances motion consistency with the source video but may lead to flickering and cause the edited video to diverge from the edited first frame.
- editing_negative_prompt
- Things not to see int the edited video
- instruct_pix2pix_prompt
- The first step invovles using timbrooks/instruct-pix2pix to edit the first frame. Specify the prompt for editing the first frame. This will be ignored if edited_first_frame above is provided.
Output Schema
Output
Example Execution Logs
Using seed: 32867 0%| | 0/100 [00:00<?, ?it/s] 1%| | 1/100 [00:00<00:13, 7.08it/s] 4%|▍ | 4/100 [00:00<00:05, 17.69it/s] 7%|▋ | 7/100 [00:00<00:04, 21.80it/s] 10%|█ | 10/100 [00:00<00:03, 23.84it/s] 13%|█▎ | 13/100 [00:00<00:03, 25.05it/s] 16%|█▌ | 16/100 [00:00<00:03, 25.82it/s] 19%|█▉ | 19/100 [00:00<00:03, 26.32it/s] 22%|██▏ | 22/100 [00:00<00:02, 26.64it/s] 25%|██▌ | 25/100 [00:01<00:02, 26.83it/s] 28%|██▊ | 28/100 [00:01<00:02, 26.98it/s] 31%|███ | 31/100 [00:01<00:02, 27.10it/s] 34%|███▍ | 34/100 [00:01<00:02, 27.16it/s] 37%|███▋ | 37/100 [00:01<00:02, 27.20it/s] 40%|████ | 40/100 [00:01<00:02, 27.23it/s] 43%|████▎ | 43/100 [00:01<00:02, 27.26it/s] 46%|████▌ | 46/100 [00:01<00:01, 27.30it/s] 49%|████▉ | 49/100 [00:01<00:01, 27.32it/s] 52%|█████▏ | 52/100 [00:02<00:01, 27.34it/s] 55%|█████▌ | 55/100 [00:02<00:01, 27.35it/s] 58%|█████▊ | 58/100 [00:02<00:01, 27.35it/s] 61%|██████ | 61/100 [00:02<00:01, 27.32it/s] 64%|██████▍ | 64/100 [00:02<00:01, 27.32it/s] 67%|██████▋ | 67/100 [00:02<00:01, 27.31it/s] 70%|███████ | 70/100 [00:02<00:01, 27.30it/s] 73%|███████▎ | 73/100 [00:02<00:00, 27.32it/s] 76%|███████▌ | 76/100 [00:02<00:00, 27.33it/s] 79%|███████▉ | 79/100 [00:02<00:00, 27.35it/s] 82%|████████▏ | 82/100 [00:03<00:00, 27.35it/s] 85%|████████▌ | 85/100 [00:03<00:00, 27.36it/s] 88%|████████▊ | 88/100 [00:03<00:00, 27.35it/s] 91%|█████████ | 91/100 [00:03<00:00, 27.34it/s] 94%|█████████▍| 94/100 [00:03<00:00, 27.34it/s] 97%|█████████▋| 97/100 [00:03<00:00, 27.34it/s] 100%|██████████| 100/100 [00:03<00:00, 27.34it/s] 100%|██████████| 100/100 [00:03<00:00, 26.56it/s] Processed and saved the first frame: exp_dir/edited_first_frame.png Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s] Loading pipeline components...: 29%|██▊ | 2/7 [00:00<00:00, 8.83it/s]The config attributes {'attention_head_dim': 64} were passed to I2VGenXLUNet, but are not expected and will be ignored. Please verify your config.json configuration file. Loading pipeline components...: 57%|█████▋ | 4/7 [00:00<00:00, 6.27it/s] Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 8.48it/s] Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 8.09it/s] 0%| | 0/100 [00:00<?, ?it/s] 1%| | 1/100 [00:00<00:26, 3.75it/s] 2%|▏ | 2/100 [00:00<00:24, 3.93it/s] 3%|▎ | 3/100 [00:00<00:24, 4.00it/s] 4%|▍ | 4/100 [00:01<00:23, 4.04it/s] 5%|▌ | 5/100 [00:01<00:23, 4.06it/s] 6%|▌ | 6/100 [00:01<00:23, 4.06it/s] 7%|▋ | 7/100 [00:01<00:22, 4.07it/s] 8%|▊ | 8/100 [00:01<00:22, 4.07it/s] 9%|▉ | 9/100 [00:02<00:22, 4.07it/s] 10%|█ | 10/100 [00:02<00:22, 4.07it/s] 11%|█ | 11/100 [00:02<00:21, 4.07it/s] 12%|█▏ | 12/100 [00:02<00:21, 4.07it/s] 13%|█▎ | 13/100 [00:03<00:21, 4.07it/s] 14%|█▍ | 14/100 [00:03<00:21, 4.08it/s] 15%|█▌ | 15/100 [00:03<00:20, 4.08it/s] 16%|█▌ | 16/100 [00:03<00:20, 4.07it/s] 17%|█▋ | 17/100 [00:04<00:20, 4.07it/s] 18%|█▊ | 18/100 [00:04<00:20, 4.07it/s] 19%|█▉ | 19/100 [00:04<00:19, 4.07it/s] 20%|██ | 20/100 [00:04<00:19, 4.07it/s] 21%|██ | 21/100 [00:05<00:19, 4.07it/s] 22%|██▏ | 22/100 [00:05<00:19, 4.07it/s] 23%|██▎ | 23/100 [00:05<00:18, 4.07it/s] 24%|██▍ | 24/100 [00:05<00:18, 4.07it/s] 25%|██▌ | 25/100 [00:06<00:18, 4.07it/s] 26%|██▌ | 26/100 [00:06<00:18, 4.07it/s] 27%|██▋ | 27/100 [00:06<00:17, 4.07it/s] 28%|██▊ | 28/100 [00:06<00:17, 4.07it/s] 29%|██▉ | 29/100 [00:07<00:17, 4.07it/s] 30%|███ | 30/100 [00:07<00:17, 4.07it/s] 31%|███ | 31/100 [00:07<00:16, 4.07it/s] 32%|███▏ | 32/100 [00:07<00:16, 4.07it/s] 33%|███▎ | 33/100 [00:08<00:16, 4.08it/s] 34%|███▍ | 34/100 [00:08<00:16, 4.08it/s] 35%|███▌ | 35/100 [00:08<00:15, 4.07it/s] 36%|███▌ | 36/100 [00:08<00:15, 4.07it/s] 37%|███▋ | 37/100 [00:09<00:15, 4.06it/s] 38%|███▊ | 38/100 [00:09<00:15, 4.07it/s] 39%|███▉ | 39/100 [00:09<00:15, 4.06it/s] 40%|████ | 40/100 [00:09<00:14, 4.07it/s] 41%|████ | 41/100 [00:10<00:14, 4.07it/s] 42%|████▏ | 42/100 [00:10<00:14, 4.08it/s] 43%|████▎ | 43/100 [00:10<00:14, 4.07it/s] 44%|████▍ | 44/100 [00:10<00:13, 4.07it/s] 45%|████▌ | 45/100 [00:11<00:13, 4.07it/s] 46%|████▌ | 46/100 [00:11<00:13, 4.07it/s] 47%|████▋ | 47/100 [00:11<00:13, 4.07it/s] 48%|████▊ | 48/100 [00:11<00:12, 4.07it/s] 49%|████▉ | 49/100 [00:12<00:12, 4.07it/s] 50%|█████ | 50/100 [00:12<00:12, 4.08it/s] 51%|█████ | 51/100 [00:12<00:12, 4.07it/s] 52%|█████▏ | 52/100 [00:12<00:11, 4.07it/s] 53%|█████▎ | 53/100 [00:13<00:11, 4.07it/s] 54%|█████▍ | 54/100 [00:13<00:11, 4.07it/s] 55%|█████▌ | 55/100 [00:13<00:11, 4.07it/s] 56%|█████▌ | 56/100 [00:13<00:10, 4.08it/s] 57%|█████▋ | 57/100 [00:14<00:10, 4.07it/s] 58%|█████▊ | 58/100 [00:14<00:10, 4.07it/s] 59%|█████▉ | 59/100 [00:14<00:10, 4.07it/s] 60%|██████ | 60/100 [00:14<00:09, 4.06it/s] 61%|██████ | 61/100 [00:15<00:09, 4.06it/s] 62%|██████▏ | 62/100 [00:15<00:09, 4.06it/s] 63%|██████▎ | 63/100 [00:15<00:09, 4.06it/s] 64%|██████▍ | 64/100 [00:15<00:08, 4.06it/s] 65%|██████▌ | 65/100 [00:15<00:08, 4.06it/s] 66%|██████▌ | 66/100 [00:16<00:08, 4.06it/s] 67%|██████▋ | 67/100 [00:16<00:08, 4.06it/s] 68%|██████▊ | 68/100 [00:16<00:07, 4.06it/s] 69%|██████▉ | 69/100 [00:16<00:07, 4.06it/s] 70%|███████ | 70/100 [00:17<00:07, 4.07it/s] 71%|███████ | 71/100 [00:17<00:07, 4.07it/s] 72%|███████▏ | 72/100 [00:17<00:06, 4.06it/s] 73%|███████▎ | 73/100 [00:17<00:06, 4.05it/s] 74%|███████▍ | 74/100 [00:18<00:06, 4.06it/s] 75%|███████▌ | 75/100 [00:18<00:06, 4.07it/s] 76%|███████▌ | 76/100 [00:18<00:05, 4.07it/s] 77%|███████▋ | 77/100 [00:18<00:05, 4.06it/s] 78%|███████▊ | 78/100 [00:19<00:05, 4.07it/s] 79%|███████▉ | 79/100 [00:19<00:05, 4.07it/s] 80%|████████ | 80/100 [00:19<00:04, 4.05it/s] 81%|████████ | 81/100 [00:19<00:04, 4.01it/s] 82%|████████▏ | 82/100 [00:20<00:04, 4.03it/s] 83%|████████▎ | 83/100 [00:20<00:04, 4.04it/s] 84%|████████▍ | 84/100 [00:20<00:03, 4.05it/s] 85%|████████▌ | 85/100 [00:20<00:03, 4.05it/s] 86%|████████▌ | 86/100 [00:21<00:03, 4.05it/s] 87%|████████▋ | 87/100 [00:21<00:03, 4.05it/s] 88%|████████▊ | 88/100 [00:21<00:02, 4.06it/s] 89%|████████▉ | 89/100 [00:21<00:02, 4.06it/s] 90%|█████████ | 90/100 [00:22<00:02, 4.07it/s] 91%|█████████ | 91/100 [00:22<00:02, 4.07it/s] 92%|█████████▏| 92/100 [00:22<00:01, 4.07it/s] 93%|█████████▎| 93/100 [00:22<00:01, 4.06it/s] 94%|█████████▍| 94/100 [00:23<00:01, 4.06it/s] 95%|█████████▌| 95/100 [00:23<00:01, 4.06it/s] 96%|█████████▌| 96/100 [00:23<00:00, 4.06it/s] 97%|█████████▋| 97/100 [00:23<00:00, 4.06it/s] 98%|█████████▊| 98/100 [00:24<00:00, 4.06it/s] 99%|█████████▉| 99/100 [00:24<00:00, 4.06it/s] 100%|██████████| 100/100 [00:24<00:00, 4.06it/s] 100%|██████████| 100/100 [00:24<00:00, 4.06it/s] ddim_scheduler.timesteps: tensor([981, 961, 941, 921, 901, 881, 861, 841, 821, 801, 781, 761, 741, 721, 701, 681, 661, 641, 621, 601, 581, 561, 541, 521, 501, 481, 461, 441, 421, 401, 381, 361, 341, 321, 301, 281, 261, 241, 221, 201, 181, 161, 141, 121, 101, 81, 61, 41, 21, 1]) ddim_scheduler.timesteps[t_idx]: 981 ddim_latents_at_t.shape: torch.Size([1, 4, 16, 64, 64]) Blending random_ratio (1 means random latent): 0.0 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:31, 1.54it/s] 4%|▍ | 2/50 [00:01<00:30, 1.56it/s] 6%|▌ | 3/50 [00:01<00:29, 1.57it/s] 8%|▊ | 4/50 [00:02<00:29, 1.58it/s] 10%|█ | 5/50 [00:03<00:28, 1.58it/s] 12%|█▏ | 6/50 [00:03<00:27, 1.58it/s] 14%|█▍ | 7/50 [00:04<00:27, 1.58it/s] 16%|█▌ | 8/50 [00:05<00:26, 1.58it/s] 18%|█▊ | 9/50 [00:05<00:25, 1.58it/s] 20%|██ | 10/50 [00:06<00:25, 1.58it/s] 22%|██▏ | 11/50 [00:06<00:24, 1.58it/s] 24%|██▍ | 12/50 [00:07<00:24, 1.58it/s] 26%|██▌ | 13/50 [00:08<00:23, 1.58it/s] 28%|██▊ | 14/50 [00:08<00:22, 1.58it/s] 30%|███ | 15/50 [00:09<00:22, 1.58it/s] 32%|███▏ | 16/50 [00:10<00:21, 1.58it/s] 34%|███▍ | 17/50 [00:10<00:20, 1.58it/s] 36%|███▌ | 18/50 [00:11<00:20, 1.58it/s] 38%|███▊ | 19/50 [00:12<00:19, 1.58it/s] 40%|████ | 20/50 [00:12<00:18, 1.58it/s] 42%|████▏ | 21/50 [00:13<00:18, 1.58it/s] 44%|████▍ | 22/50 [00:13<00:17, 1.58it/s] 46%|████▌ | 23/50 [00:14<00:17, 1.58it/s] 48%|████▊ | 24/50 [00:15<00:16, 1.58it/s] 50%|█████ | 25/50 [00:15<00:15, 1.58it/s] 52%|█████▏ | 26/50 [00:16<00:15, 1.58it/s] 54%|█████▍ | 27/50 [00:17<00:14, 1.58it/s] 56%|█████▌ | 28/50 [00:17<00:13, 1.58it/s] 58%|█████▊ | 29/50 [00:18<00:13, 1.58it/s] 60%|██████ | 30/50 [00:19<00:12, 1.58it/s] 62%|██████▏ | 31/50 [00:19<00:12, 1.58it/s] 64%|██████▍ | 32/50 [00:20<00:11, 1.58it/s] 66%|██████▌ | 33/50 [00:20<00:10, 1.58it/s] 68%|██████▊ | 34/50 [00:21<00:10, 1.58it/s] 70%|███████ | 35/50 [00:22<00:09, 1.58it/s] 72%|███████▏ | 36/50 [00:22<00:08, 1.58it/s] 74%|███████▍ | 37/50 [00:23<00:08, 1.58it/s] 76%|███████▌ | 38/50 [00:24<00:07, 1.58it/s] 78%|███████▊ | 39/50 [00:24<00:06, 1.58it/s] 80%|████████ | 40/50 [00:25<00:06, 1.58it/s] 82%|████████▏ | 41/50 [00:25<00:05, 1.57it/s] 84%|████████▍ | 42/50 [00:26<00:05, 1.58it/s] 86%|████████▌ | 43/50 [00:27<00:04, 1.58it/s] 88%|████████▊ | 44/50 [00:27<00:03, 1.58it/s] 90%|█████████ | 45/50 [00:28<00:03, 1.58it/s] 92%|█████████▏| 46/50 [00:29<00:02, 1.58it/s] 94%|█████████▍| 47/50 [00:29<00:01, 1.58it/s] 96%|█████████▌| 48/50 [00:30<00:01, 1.58it/s] 98%|█████████▊| 49/50 [00:31<00:00, 1.58it/s] 100%|██████████| 50/50 [00:31<00:00, 1.58it/s] 100%|██████████| 50/50 [00:31<00:00, 1.58it/s]
Version Details
- Version ID
3c7b5bc5bcae13a0c945e6f918bb8409f76cc46102c7c9a208bd1531f82c5684
- Version Created
- March 28, 2024