tiger-ai-lab/anyv2v 🔢🖼️📝 → 🖼️

▶️ 1.0K runs 📅 Mar 2024 ⚙️ Cog 0.9.4 🔗 GitHub 📄 Paper ⚖️ License
video-editing video-to-video

About

Tuning-free framework to achieve high appearance and temporal consistency in video editing

Example Output

Output

Performance Metrics

65.41s Prediction Time
105.81s Total Time
All Input Parameters
{
  "video": "https://replicate.delivery/pbxt/KcsKIflCcgFseI734HsfUIPHr4gBir2RTKoaFs73qGIB8qeo/test.mp4",
  "pnp_f_t": 1,
  "editing_prompt": "a man doing exercises for the body and mind",
  "guidance_scale": 9,
  "pnp_temp_attn_t": 1,
  "pnp_spatial_attn_t": 1,
  "num_inference_steps": 50,
  "ddim_inversion_steps": 100,
  "ddim_init_latents_t_idx": 0,
  "editing_negative_prompt": "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms",
  "instruct_pix2pix_prompt": "turn man into robot"
}
Input Parameters
seed Type: integer
Random seed. Leave blank to randomize the seed
video (required) Type: string
Input video
pnp_f_t Type: numberDefault: 1Range: 0 - 1
Specifies the proportion of time steps in the DDIM sampling process where the convolutional injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step
editing_prompt Type: stringDefault: a man doing exercises for the body and mind
Describe the input video
guidance_scale Type: numberDefault: 9Range: 1 - 20
Scale for classifier-free guidance
pnp_temp_attn_t Type: numberDefault: 1Range: 0 - 1
Specifies the proportion of time steps in the DDIM sampling process where the temporal attention injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step
edited_first_frame Type: string
Provide the edited first frame of the input video. This is optional, leave it blank and provide the prompt below to use the default pipeline that edits the frist frame with instructpix2pix
pnp_spatial_attn_t Type: numberDefault: 1Range: 0 - 1
Specifies the proportion of time steps in the DDIM sampling process where the spatial attention injection is applied. A higher value improves motion consistency. 1.0 indicates injection at every time step
num_inference_steps Type: integerDefault: 50Range: 1 - 500
Number of denoising steps
ddim_inversion_steps Type: integerDefault: 100
Number of ddim inversion steps
ddim_init_latents_t_idx Type: integerDefault: 0Range: 0 - ∞
This parameter determines the time step index at which to begin sampling from the initial DDIM inversed latents, with a range of [0, num_inference_steps-1]. In the context of a DDIM sampling process where the sampling step is 50, the scheduler progresses through the time steps in the sequence [981, 961, 941, ..., 1]. Therefore, setting ddim_init_latents_t_idx to 0 initiates the sampling from t=981, whereas setting it to 1 starts the process at t=961. A higher index enhances motion consistency with the source video but may lead to flickering and cause the edited video to diverge from the edited first frame.
editing_negative_prompt Type: stringDefault: Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms
Things not to see int the edited video
instruct_pix2pix_prompt Type: stringDefault: turn man into robot
The first step invovles using timbrooks/instruct-pix2pix to edit the first frame. Specify the prompt for editing the first frame. This will be ignored if edited_first_frame above is provided.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 32867
  0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<00:13,  7.08it/s]
  4%|▍         | 4/100 [00:00<00:05, 17.69it/s]
  7%|▋         | 7/100 [00:00<00:04, 21.80it/s]
 10%|█         | 10/100 [00:00<00:03, 23.84it/s]
 13%|█▎        | 13/100 [00:00<00:03, 25.05it/s]
 16%|█▌        | 16/100 [00:00<00:03, 25.82it/s]
 19%|█▉        | 19/100 [00:00<00:03, 26.32it/s]
 22%|██▏       | 22/100 [00:00<00:02, 26.64it/s]
 25%|██▌       | 25/100 [00:01<00:02, 26.83it/s]
 28%|██▊       | 28/100 [00:01<00:02, 26.98it/s]
 31%|███       | 31/100 [00:01<00:02, 27.10it/s]
 34%|███▍      | 34/100 [00:01<00:02, 27.16it/s]
 37%|███▋      | 37/100 [00:01<00:02, 27.20it/s]
 40%|████      | 40/100 [00:01<00:02, 27.23it/s]
 43%|████▎     | 43/100 [00:01<00:02, 27.26it/s]
 46%|████▌     | 46/100 [00:01<00:01, 27.30it/s]
 49%|████▉     | 49/100 [00:01<00:01, 27.32it/s]
 52%|█████▏    | 52/100 [00:02<00:01, 27.34it/s]
 55%|█████▌    | 55/100 [00:02<00:01, 27.35it/s]
 58%|█████▊    | 58/100 [00:02<00:01, 27.35it/s]
 61%|██████    | 61/100 [00:02<00:01, 27.32it/s]
 64%|██████▍   | 64/100 [00:02<00:01, 27.32it/s]
 67%|██████▋   | 67/100 [00:02<00:01, 27.31it/s]
 70%|███████   | 70/100 [00:02<00:01, 27.30it/s]
 73%|███████▎  | 73/100 [00:02<00:00, 27.32it/s]
 76%|███████▌  | 76/100 [00:02<00:00, 27.33it/s]
 79%|███████▉  | 79/100 [00:02<00:00, 27.35it/s]
 82%|████████▏ | 82/100 [00:03<00:00, 27.35it/s]
 85%|████████▌ | 85/100 [00:03<00:00, 27.36it/s]
 88%|████████▊ | 88/100 [00:03<00:00, 27.35it/s]
 91%|█████████ | 91/100 [00:03<00:00, 27.34it/s]
 94%|█████████▍| 94/100 [00:03<00:00, 27.34it/s]
 97%|█████████▋| 97/100 [00:03<00:00, 27.34it/s]
100%|██████████| 100/100 [00:03<00:00, 27.34it/s]
100%|██████████| 100/100 [00:03<00:00, 26.56it/s]
Processed and saved the first frame: exp_dir/edited_first_frame.png
Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]
Loading pipeline components...:  29%|██▊       | 2/7 [00:00<00:00,  8.83it/s]The config attributes {'attention_head_dim': 64} were passed to I2VGenXLUNet, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading pipeline components...:  57%|█████▋    | 4/7 [00:00<00:00,  6.27it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  8.48it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00,  8.09it/s]
  0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<00:26,  3.75it/s]
  2%|▏         | 2/100 [00:00<00:24,  3.93it/s]
  3%|▎         | 3/100 [00:00<00:24,  4.00it/s]
  4%|▍         | 4/100 [00:01<00:23,  4.04it/s]
  5%|▌         | 5/100 [00:01<00:23,  4.06it/s]
  6%|▌         | 6/100 [00:01<00:23,  4.06it/s]
  7%|▋         | 7/100 [00:01<00:22,  4.07it/s]
  8%|▊         | 8/100 [00:01<00:22,  4.07it/s]
  9%|▉         | 9/100 [00:02<00:22,  4.07it/s]
 10%|█         | 10/100 [00:02<00:22,  4.07it/s]
 11%|█         | 11/100 [00:02<00:21,  4.07it/s]
 12%|█▏        | 12/100 [00:02<00:21,  4.07it/s]
 13%|█▎        | 13/100 [00:03<00:21,  4.07it/s]
 14%|█▍        | 14/100 [00:03<00:21,  4.08it/s]
 15%|█▌        | 15/100 [00:03<00:20,  4.08it/s]
 16%|█▌        | 16/100 [00:03<00:20,  4.07it/s]
 17%|█▋        | 17/100 [00:04<00:20,  4.07it/s]
 18%|█▊        | 18/100 [00:04<00:20,  4.07it/s]
 19%|█▉        | 19/100 [00:04<00:19,  4.07it/s]
 20%|██        | 20/100 [00:04<00:19,  4.07it/s]
 21%|██        | 21/100 [00:05<00:19,  4.07it/s]
 22%|██▏       | 22/100 [00:05<00:19,  4.07it/s]
 23%|██▎       | 23/100 [00:05<00:18,  4.07it/s]
 24%|██▍       | 24/100 [00:05<00:18,  4.07it/s]
 25%|██▌       | 25/100 [00:06<00:18,  4.07it/s]
 26%|██▌       | 26/100 [00:06<00:18,  4.07it/s]
 27%|██▋       | 27/100 [00:06<00:17,  4.07it/s]
 28%|██▊       | 28/100 [00:06<00:17,  4.07it/s]
 29%|██▉       | 29/100 [00:07<00:17,  4.07it/s]
 30%|███       | 30/100 [00:07<00:17,  4.07it/s]
 31%|███       | 31/100 [00:07<00:16,  4.07it/s]
 32%|███▏      | 32/100 [00:07<00:16,  4.07it/s]
 33%|███▎      | 33/100 [00:08<00:16,  4.08it/s]
 34%|███▍      | 34/100 [00:08<00:16,  4.08it/s]
 35%|███▌      | 35/100 [00:08<00:15,  4.07it/s]
 36%|███▌      | 36/100 [00:08<00:15,  4.07it/s]
 37%|███▋      | 37/100 [00:09<00:15,  4.06it/s]
 38%|███▊      | 38/100 [00:09<00:15,  4.07it/s]
 39%|███▉      | 39/100 [00:09<00:15,  4.06it/s]
 40%|████      | 40/100 [00:09<00:14,  4.07it/s]
 41%|████      | 41/100 [00:10<00:14,  4.07it/s]
 42%|████▏     | 42/100 [00:10<00:14,  4.08it/s]
 43%|████▎     | 43/100 [00:10<00:14,  4.07it/s]
 44%|████▍     | 44/100 [00:10<00:13,  4.07it/s]
 45%|████▌     | 45/100 [00:11<00:13,  4.07it/s]
 46%|████▌     | 46/100 [00:11<00:13,  4.07it/s]
 47%|████▋     | 47/100 [00:11<00:13,  4.07it/s]
 48%|████▊     | 48/100 [00:11<00:12,  4.07it/s]
 49%|████▉     | 49/100 [00:12<00:12,  4.07it/s]
 50%|█████     | 50/100 [00:12<00:12,  4.08it/s]
 51%|█████     | 51/100 [00:12<00:12,  4.07it/s]
 52%|█████▏    | 52/100 [00:12<00:11,  4.07it/s]
 53%|█████▎    | 53/100 [00:13<00:11,  4.07it/s]
 54%|█████▍    | 54/100 [00:13<00:11,  4.07it/s]
 55%|█████▌    | 55/100 [00:13<00:11,  4.07it/s]
 56%|█████▌    | 56/100 [00:13<00:10,  4.08it/s]
 57%|█████▋    | 57/100 [00:14<00:10,  4.07it/s]
 58%|█████▊    | 58/100 [00:14<00:10,  4.07it/s]
 59%|█████▉    | 59/100 [00:14<00:10,  4.07it/s]
 60%|██████    | 60/100 [00:14<00:09,  4.06it/s]
 61%|██████    | 61/100 [00:15<00:09,  4.06it/s]
 62%|██████▏   | 62/100 [00:15<00:09,  4.06it/s]
 63%|██████▎   | 63/100 [00:15<00:09,  4.06it/s]
 64%|██████▍   | 64/100 [00:15<00:08,  4.06it/s]
 65%|██████▌   | 65/100 [00:15<00:08,  4.06it/s]
 66%|██████▌   | 66/100 [00:16<00:08,  4.06it/s]
 67%|██████▋   | 67/100 [00:16<00:08,  4.06it/s]
 68%|██████▊   | 68/100 [00:16<00:07,  4.06it/s]
 69%|██████▉   | 69/100 [00:16<00:07,  4.06it/s]
 70%|███████   | 70/100 [00:17<00:07,  4.07it/s]
 71%|███████   | 71/100 [00:17<00:07,  4.07it/s]
 72%|███████▏  | 72/100 [00:17<00:06,  4.06it/s]
 73%|███████▎  | 73/100 [00:17<00:06,  4.05it/s]
 74%|███████▍  | 74/100 [00:18<00:06,  4.06it/s]
 75%|███████▌  | 75/100 [00:18<00:06,  4.07it/s]
 76%|███████▌  | 76/100 [00:18<00:05,  4.07it/s]
 77%|███████▋  | 77/100 [00:18<00:05,  4.06it/s]
 78%|███████▊  | 78/100 [00:19<00:05,  4.07it/s]
 79%|███████▉  | 79/100 [00:19<00:05,  4.07it/s]
 80%|████████  | 80/100 [00:19<00:04,  4.05it/s]
 81%|████████  | 81/100 [00:19<00:04,  4.01it/s]
 82%|████████▏ | 82/100 [00:20<00:04,  4.03it/s]
 83%|████████▎ | 83/100 [00:20<00:04,  4.04it/s]
 84%|████████▍ | 84/100 [00:20<00:03,  4.05it/s]
 85%|████████▌ | 85/100 [00:20<00:03,  4.05it/s]
 86%|████████▌ | 86/100 [00:21<00:03,  4.05it/s]
 87%|████████▋ | 87/100 [00:21<00:03,  4.05it/s]
 88%|████████▊ | 88/100 [00:21<00:02,  4.06it/s]
 89%|████████▉ | 89/100 [00:21<00:02,  4.06it/s]
 90%|█████████ | 90/100 [00:22<00:02,  4.07it/s]
 91%|█████████ | 91/100 [00:22<00:02,  4.07it/s]
 92%|█████████▏| 92/100 [00:22<00:01,  4.07it/s]
 93%|█████████▎| 93/100 [00:22<00:01,  4.06it/s]
 94%|█████████▍| 94/100 [00:23<00:01,  4.06it/s]
 95%|█████████▌| 95/100 [00:23<00:01,  4.06it/s]
 96%|█████████▌| 96/100 [00:23<00:00,  4.06it/s]
 97%|█████████▋| 97/100 [00:23<00:00,  4.06it/s]
 98%|█████████▊| 98/100 [00:24<00:00,  4.06it/s]
 99%|█████████▉| 99/100 [00:24<00:00,  4.06it/s]
100%|██████████| 100/100 [00:24<00:00,  4.06it/s]
100%|██████████| 100/100 [00:24<00:00,  4.06it/s]
ddim_scheduler.timesteps: tensor([981, 961, 941, 921, 901, 881, 861, 841, 821, 801, 781, 761, 741, 721,
701, 681, 661, 641, 621, 601, 581, 561, 541, 521, 501, 481, 461, 441,
421, 401, 381, 361, 341, 321, 301, 281, 261, 241, 221, 201, 181, 161,
141, 121, 101,  81,  61,  41,  21,   1])
ddim_scheduler.timesteps[t_idx]: 981
ddim_latents_at_t.shape: torch.Size([1, 4, 16, 64, 64])
Blending random_ratio (1 means random latent): 0.0
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:31,  1.54it/s]
  4%|▍         | 2/50 [00:01<00:30,  1.56it/s]
  6%|▌         | 3/50 [00:01<00:29,  1.57it/s]
  8%|▊         | 4/50 [00:02<00:29,  1.58it/s]
 10%|█         | 5/50 [00:03<00:28,  1.58it/s]
 12%|█▏        | 6/50 [00:03<00:27,  1.58it/s]
 14%|█▍        | 7/50 [00:04<00:27,  1.58it/s]
 16%|█▌        | 8/50 [00:05<00:26,  1.58it/s]
 18%|█▊        | 9/50 [00:05<00:25,  1.58it/s]
 20%|██        | 10/50 [00:06<00:25,  1.58it/s]
 22%|██▏       | 11/50 [00:06<00:24,  1.58it/s]
 24%|██▍       | 12/50 [00:07<00:24,  1.58it/s]
 26%|██▌       | 13/50 [00:08<00:23,  1.58it/s]
 28%|██▊       | 14/50 [00:08<00:22,  1.58it/s]
 30%|███       | 15/50 [00:09<00:22,  1.58it/s]
 32%|███▏      | 16/50 [00:10<00:21,  1.58it/s]
 34%|███▍      | 17/50 [00:10<00:20,  1.58it/s]
 36%|███▌      | 18/50 [00:11<00:20,  1.58it/s]
 38%|███▊      | 19/50 [00:12<00:19,  1.58it/s]
 40%|████      | 20/50 [00:12<00:18,  1.58it/s]
 42%|████▏     | 21/50 [00:13<00:18,  1.58it/s]
 44%|████▍     | 22/50 [00:13<00:17,  1.58it/s]
 46%|████▌     | 23/50 [00:14<00:17,  1.58it/s]
 48%|████▊     | 24/50 [00:15<00:16,  1.58it/s]
 50%|█████     | 25/50 [00:15<00:15,  1.58it/s]
 52%|█████▏    | 26/50 [00:16<00:15,  1.58it/s]
 54%|█████▍    | 27/50 [00:17<00:14,  1.58it/s]
 56%|█████▌    | 28/50 [00:17<00:13,  1.58it/s]
 58%|█████▊    | 29/50 [00:18<00:13,  1.58it/s]
 60%|██████    | 30/50 [00:19<00:12,  1.58it/s]
 62%|██████▏   | 31/50 [00:19<00:12,  1.58it/s]
 64%|██████▍   | 32/50 [00:20<00:11,  1.58it/s]
 66%|██████▌   | 33/50 [00:20<00:10,  1.58it/s]
 68%|██████▊   | 34/50 [00:21<00:10,  1.58it/s]
 70%|███████   | 35/50 [00:22<00:09,  1.58it/s]
 72%|███████▏  | 36/50 [00:22<00:08,  1.58it/s]
 74%|███████▍  | 37/50 [00:23<00:08,  1.58it/s]
 76%|███████▌  | 38/50 [00:24<00:07,  1.58it/s]
 78%|███████▊  | 39/50 [00:24<00:06,  1.58it/s]
 80%|████████  | 40/50 [00:25<00:06,  1.58it/s]
 82%|████████▏ | 41/50 [00:25<00:05,  1.57it/s]
 84%|████████▍ | 42/50 [00:26<00:05,  1.58it/s]
 86%|████████▌ | 43/50 [00:27<00:04,  1.58it/s]
 88%|████████▊ | 44/50 [00:27<00:03,  1.58it/s]
 90%|█████████ | 45/50 [00:28<00:03,  1.58it/s]
 92%|█████████▏| 46/50 [00:29<00:02,  1.58it/s]
 94%|█████████▍| 47/50 [00:29<00:01,  1.58it/s]
 96%|█████████▌| 48/50 [00:30<00:01,  1.58it/s]
 98%|█████████▊| 49/50 [00:31<00:00,  1.58it/s]
100%|██████████| 50/50 [00:31<00:00,  1.58it/s]
100%|██████████| 50/50 [00:31<00:00,  1.58it/s]
Version Details
Version ID
3c7b5bc5bcae13a0c945e6f918bb8409f76cc46102c7c9a208bd1531f82c5684
Version Created
March 28, 2024
Run on Replicate →