lucataco/zeta-editing 🔢🖼️📝❓ → 🖼️
About
Zero-Shot Text-Based Audio Editing Using DDPM Inversion
Example Output
Prompt:
"A recording of an arcade game soundtrack"
Output
Performance Metrics
14.98s
Prediction Time
15.02s
Total Time
All Input Parameters
{
"audio": "https://replicate.delivery/pbxt/KVudHMNiL8E0LcDPJfRr8WzgmY5Mry4d2uhYqMyU7xseJdWr/Beethoven.wav",
"steps": 50,
"prompt": "A recording of an arcade game soundtrack",
"t_start": 45,
"audio_version": "cvssp/audioldm2-music",
"cfg_scale_src": 3,
"cfg_scale_tar": 12,
"source_prompt": ""
}
Input Parameters
- seed
- Random seed
- audio (required)
- Input Audio File
- steps
- Number of diffusion steps, higher values(200) yield high-quality generations
- prompt
- Describe your desired edited output
- t_start
- Lower % returns closer to the original audio, higher returns stronger edit
- audio_version
- Choose the audio version to return
- cfg_scale_src
- Source Guidance Scale
- cfg_scale_tar
- Target Guidance Scale
- source_prompt
- Optional: describe the original audio input
Output Schema
Output
Example Execution Logs
Using seed: 1060458061 Using model: cvssp/audioldm2-music 0%| | 0/50 [00:00<?, ?it/s] 4%|▍ | 2/50 [00:00<00:03, 12.68it/s] 8%|▊ | 4/50 [00:00<00:03, 12.36it/s] 12%|█▏ | 6/50 [00:00<00:03, 12.40it/s] 16%|█▌ | 8/50 [00:00<00:03, 12.40it/s] 20%|██ | 10/50 [00:00<00:03, 12.58it/s] 24%|██▍ | 12/50 [00:00<00:03, 12.62it/s] 28%|██▊ | 14/50 [00:01<00:02, 12.67it/s] 32%|███▏ | 16/50 [00:01<00:02, 12.73it/s] 36%|███▌ | 18/50 [00:01<00:02, 12.68it/s] 40%|████ | 20/50 [00:01<00:02, 12.67it/s] 44%|████▍ | 22/50 [00:01<00:02, 12.57it/s] 48%|████▊ | 24/50 [00:01<00:02, 12.56it/s] 52%|█████▏ | 26/50 [00:02<00:01, 12.64it/s] 56%|█████▌ | 28/50 [00:02<00:01, 12.72it/s] 60%|██████ | 30/50 [00:02<00:01, 12.68it/s] 64%|██████▍ | 32/50 [00:02<00:01, 12.61it/s] 68%|██████▊ | 34/50 [00:02<00:01, 12.50it/s] 72%|███████▏ | 36/50 [00:02<00:01, 12.56it/s] 76%|███████▌ | 38/50 [00:03<00:00, 12.62it/s] 80%|████████ | 40/50 [00:03<00:00, 12.62it/s] 84%|████████▍ | 42/50 [00:03<00:00, 12.58it/s] 88%|████████▊ | 44/50 [00:03<00:00, 12.60it/s] 92%|█████████▏| 46/50 [00:03<00:00, 12.62it/s] 96%|█████████▌| 48/50 [00:03<00:00, 12.63it/s] 100%|██████████| 50/50 [00:03<00:00, 12.67it/s] 100%|██████████| 50/50 [00:03<00:00, 12.61it/s] 0%| | 0/22 [00:00<?, ?it/s] 5%|▍ | 1/22 [00:00<00:03, 6.44it/s] 9%|▉ | 2/22 [00:00<00:03, 6.39it/s] 14%|█▎ | 3/22 [00:00<00:02, 6.39it/s] 18%|█▊ | 4/22 [00:00<00:02, 6.44it/s] 23%|██▎ | 5/22 [00:00<00:02, 6.44it/s] 27%|██▋ | 6/22 [00:00<00:02, 6.47it/s] 32%|███▏ | 7/22 [00:01<00:02, 6.46it/s] 36%|███▋ | 8/22 [00:01<00:02, 6.47it/s] 41%|████ | 9/22 [00:01<00:02, 6.44it/s] 45%|████▌ | 10/22 [00:01<00:01, 6.43it/s] 50%|█████ | 11/22 [00:01<00:01, 6.40it/s] 55%|█████▍ | 12/22 [00:01<00:01, 6.42it/s] 59%|█████▉ | 13/22 [00:02<00:01, 6.44it/s] 64%|██████▎ | 14/22 [00:02<00:01, 6.46it/s] 68%|██████▊ | 15/22 [00:02<00:01, 6.47it/s] 73%|███████▎ | 16/22 [00:02<00:00, 6.44it/s] 77%|███████▋ | 17/22 [00:02<00:00, 6.43it/s] 82%|████████▏ | 18/22 [00:02<00:00, 6.41it/s] 86%|████████▋ | 19/22 [00:02<00:00, 6.44it/s] 91%|█████████ | 20/22 [00:03<00:00, 6.45it/s] 95%|█████████▌| 21/22 [00:03<00:00, 6.44it/s] 100%|██████████| 22/22 [00:03<00:00, 6.46it/s] 100%|██████████| 22/22 [00:03<00:00, 6.44it/s]
Version Details
- Version ID
ff80c3cca3dc792e2cb119367dc11295b499cda961a2d0b34ecf5109b59c6528- Version Created
- March 4, 2024