lucataco/singing_voice_conversion 🖼️❓🔢 → 🖼️
About
Amphion Singing Voice Conversion: DiffWaveNetSVC

Example Output
Output
Performance Metrics
31.51s
Prediction Time
156.53s
Total Time
All Input Parameters
{ "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav", "target_singer": "Taylor Swift", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 }
Input Parameters
- source_audio (required)
- Input source audio file
- target_singer
- Target singer to convert audio to
- key_shift_mode
- Key shift values
- pitch_shift_control
- Pitch shift control
- diffusion_inference_steps
- Diffusion inference steps
Output Schema
Output
Example Execution Logs
/tmp/input_audio vocalist_l1_TaylorSwift autoshift getopt: unrecognized option '--diffusion_inference_steps' Exprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `1` `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. Monotonic align not found. Please make sure you have compiled it. There are 1 source audios: ********** Conversion for source... Prepare for meta eval data: 0.0s 0%| | 0/1 [00:00<?, ?it/s] 0%| | 0/1 [00:00<?, ?it/s][A 100%|██████████| 1/1 [00:01<00:00, 1.98s/it][A 100%|██████████| 1/1 [00:01<00:00, 1.98s/it] Prepare for acoustic features: 2.0s Prepare for content features: 0.0s 2023-12-21 22:37:31 | INFO | inference | ======================================================== 2023-12-21 22:37:31 | INFO | inference | || New inference process started. || 2023-12-21 22:37:31 | INFO | inference | ======================================================== 2023-12-21 22:37:31 | INFO | inference | 2023-12-21 22:37:31 | DEBUG | inference | Using DEBUG logging level. 2023-12-21 22:37:31 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper 2023-12-21 22:37:31 | DEBUG | inference | Vocoder dir: pretrained/bigvgan 2023-12-21 22:37:31 | DEBUG | inference | Setting random seed done in 0.83ms 2023-12-21 22:37:31 | DEBUG | inference | Random seed: 10086 2023-12-21 22:37:31 | INFO | inference | Building dataset... 2023-12-21 22:37:31 | INFO | inference | Building dataset done in 4.60ms 2023-12-21 22:37:31 | INFO | inference | Building model... 2023-12-21 22:37:31 | INFO | inference | Building model done in 276.183ms 2023-12-21 22:37:31 | INFO | inference | Initializing accelerate... 2023-12-21 22:37:32 | INFO | inference | Initializing accelerate done in 1057.268ms 2023-12-21 22:37:32 | INFO | inference | Loading checkpoint... 2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All model weights loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All optimizer states loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All scheduler states loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All random states loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading in 0 custom states 2023-12-21 22:37:32 | INFO | inference | Loading checkpoint done in 106.015ms 2023-12-21 22:37:32 | INFO | inference | Using PNDM scheduler. Model Init: 1.5s Auto transposing: source f0 median = 372.9, target f0 median = 286.9, factor = 0.77 0%| | 0/1009 [00:00<?, ?it/s][A 0%| | 1/1009 [00:02<39:02, 2.32s/it][A 2%|▏ | 20/1009 [00:02<01:26, 11.39it/s][A 4%|▍ | 39/1009 [00:02<00:38, 25.02it/s][A 6%|▌ | 58/1009 [00:02<00:23, 41.23it/s][A 8%|▊ | 77/1009 [00:02<00:15, 59.36it/s][A 10%|▉ | 96/1009 [00:02<00:11, 78.69it/s][A 11%|█▏ | 115/1009 [00:02<00:09, 98.15it/s][A 13%|█▎ | 134/1009 [00:03<00:07, 116.43it/s][A 15%|█▌ | 153/1009 [00:03<00:06, 132.68it/s][A 17%|█▋ | 172/1009 [00:03<00:05, 146.29it/s][A 19%|█▉ | 191/1009 [00:03<00:05, 157.09it/s][A 21%|██ | 210/1009 [00:03<00:04, 164.64it/s][A 23%|██▎ | 229/1009 [00:03<00:04, 170.24it/s][A 25%|██▍ | 248/1009 [00:03<00:04, 174.54it/s][A 26%|██▋ | 267/1009 [00:03<00:04, 176.83it/s][A 28%|██▊ | 286/1009 [00:03<00:04, 178.76it/s][A 30%|███ | 305/1009 [00:03<00:03, 180.21it/s][A 32%|███▏ | 324/1009 [00:04<00:03, 179.83it/s][A 34%|███▍ | 343/1009 [00:04<00:03, 179.98it/s][A 36%|███▌ | 362/1009 [00:04<00:03, 181.63it/s][A 38%|███▊ | 381/1009 [00:04<00:03, 181.04it/s][A 40%|███▉ | 400/1009 [00:04<00:03, 182.13it/s][A 42%|████▏ | 419/1009 [00:04<00:03, 182.37it/s][A 43%|████▎ | 438/1009 [00:04<00:03, 183.85it/s][A 45%|████▌ | 457/1009 [00:04<00:02, 184.98it/s][A 47%|████▋ | 476/1009 [00:04<00:02, 185.91it/s][A 49%|████▉ | 495/1009 [00:04<00:02, 186.25it/s][A 51%|█████ | 514/1009 [00:05<00:02, 187.04it/s][A 53%|█████▎ | 533/1009 [00:05<00:02, 187.69it/s][A 55%|█████▍ | 552/1009 [00:05<00:02, 188.32it/s][A 57%|█████▋ | 571/1009 [00:05<00:02, 188.13it/s][A 58%|█████▊ | 590/1009 [00:05<00:02, 188.37it/s][A 60%|██████ | 609/1009 [00:05<00:02, 188.66it/s][A 62%|██████▏ | 628/1009 [00:05<00:02, 188.88it/s][A 64%|██████▍ | 647/1009 [00:05<00:01, 188.97it/s][A 66%|██████▌ | 666/1009 [00:05<00:01, 188.77it/s][A 68%|██████▊ | 685/1009 [00:05<00:01, 188.38it/s][A 70%|██████▉ | 704/1009 [00:06<00:01, 188.63it/s][A 72%|███████▏ | 723/1009 [00:06<00:01, 188.84it/s][A 74%|███████▎ | 742/1009 [00:06<00:01, 189.15it/s][A 75%|███████▌ | 761/1009 [00:06<00:01, 188.98it/s][A 77%|███████▋ | 780/1009 [00:06<00:01, 189.16it/s][A 79%|███████▉ | 799/1009 [00:06<00:01, 186.62it/s][A 81%|████████ | 819/1009 [00:06<00:01, 188.31it/s][A 83%|████████▎ | 838/1009 [00:06<00:00, 185.22it/s][A 85%|████████▍ | 857/1009 [00:06<00:00, 186.46it/s][A 87%|████████▋ | 877/1009 [00:07<00:00, 188.15it/s][A 89%|████████▉ | 897/1009 [00:07<00:00, 188.85it/s][A 91%|█████████ | 917/1009 [00:07<00:00, 189.80it/s][A 93%|█████████▎| 937/1009 [00:07<00:00, 190.56it/s][A 95%|█████████▍| 957/1009 [00:07<00:00, 190.87it/s][A 97%|█████████▋| 977/1009 [00:07<00:00, 191.29it/s][A 99%|█████████▉| 997/1009 [00:07<00:00, 190.46it/s][A 100%|██████████| 1009/1009 [00:07<00:00, 130.99it/s] Synthesis audios using bigvgan vocoder... Loading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt For predicted mels, #sample = 1... Model inference: 14.1s 100%|██████████| 1/1 [00:17<00:00, 17.56s/it] 100%|██████████| 1/1 [00:17<00:00, 17.56s/it] /src/Amphion/result/source/source_vocalist_l1_TaylorSwift.wav
Version Details
- Version ID
f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b
- Version Created
- December 21, 2023