meta-innovation/ultimate_rvc 🔢❓🖼️✓ → 🖼️
About
An extension of AiCoverGen, which provides several new features and improvements, enabling users to generate audio-related content using RVC with ease. Ideal for people who want to incorporate singing functionality into their AI assistant/chatbot/vtuber,
Example Output
Output
Performance Metrics
173.74s
Prediction Time
376.26s
Total Time
All Input Parameters
{
"pitch": 0,
"dry_level": 0.8,
"f0_method": "rmvpe",
"inst_gain": 0,
"main_gain": 0,
"room_size": 0.15,
"rvc_model": "https://huggingface.co/bennetJL/TaylorSwift/resolve/main/TaylorSwift2024.zip",
"wet_level": 0.2,
"index_rate": 0.51,
"song_input": "https://cdn-melody-craft.artvibe.ai/mp3/2026/01/2009539268451794944.mp3",
"backup_gain": 0,
"clean_vocals": true,
"protect_rate": 0.13,
"rms_mix_rate": 1,
"split_vocals": true,
"output_format": "mp3",
"autotune_vocals": true
}
Input Parameters
- pitch
- Semitone pitch shift for the converted vocals. Positive values raise pitch, negative values lower it.
- dry_level
- Dry level of the reverb effect on converted vocals (0–1).
- f0_method
- Pitch extraction method used during vocal conversion.
- inst_gain
- Volume gain (dB) for the instrumental track.
- main_gain
- Volume gain (dB) for the converted vocal track.
- room_size
- Room size of the reverb effect on converted vocals (0–1).
- rvc_model (required)
- Zip archive containing your RVC voice model. Must include a .pth file and optionally a .index file.
- wet_level
- Wet level of the reverb effect on converted vocals (0–1).
- index_rate
- Influence of the .index file on vocal conversion (0–1). Higher values make output closer to the voice model.
- song_input (required)
- Audio file to convert. Upload a file directly or provide a URL (e.g. https://example.com/song.mp3). Supported formats: mp3, wav, flac, ogg, m4a, aac.
- backup_gain
- Volume gain (dB) for the backup vocals track.
- clean_vocals
- Apply noise reduction to the converted vocals.
- protect_rate
- Protection strength for consonants and breathing sounds (0–0.5). Lower values protect more.
- rms_mix_rate
- Blending rate of the volume envelope of the converted vocals with the original (0–1).
- split_vocals
- Split audio into segments before vocal conversion.
- output_format
- Format of the output audio file.
- autotune_vocals
- Apply autotune to the converted vocals.
Output Schema
Output
Example Execution Logs
[~] Retrieving song... [~] Separating vocals from instrumentals... 0%| | 0.00/3.54k [00:00<?, ?iB/s] 28.3kiB [00:00, 40.8MiB/s] 0%| | 0.00/66.8M [00:00<?, ?iB/s] 1%| | 606k/66.8M [00:00<00:10, 6.04MiB/s] 2%|▏ | 1.65M/66.8M [00:00<00:07, 8.61MiB/s] 5%|▍ | 3.24M/66.8M [00:00<00:05, 11.9MiB/s] 8%|▊ | 5.62M/66.8M [00:00<00:03, 16.5MiB/s] 14%|█▍ | 9.24M/66.8M [00:00<00:02, 23.6MiB/s] 22%|██▏ | 14.5M/66.8M [00:00<00:01, 33.3MiB/s] 33%|███▎ | 22.3M/66.8M [00:00<00:00, 48.0MiB/s] 51%|█████ | 33.9M/66.8M [00:00<00:00, 69.5MiB/s] 76%|███████▋ | 51.1M/66.8M [00:00<00:00, 101MiB/s] 100%|██████████| 66.8M/66.8M [00:00<00:00, 67.7MiB/s] 0%| | 0.00/1.00k [00:00<?, ?iB/s] 4.38kiB [00:00, 9.68MiB/s] 0%| | 0.00/2.77k [00:00<?, ?iB/s] 15.4kiB [00:00, 28.5MiB/s] 0%| | 0/25 [00:00<?, ?it/s] 4%|▍ | 1/25 [00:01<00:43, 1.80s/it] 8%|▊ | 2/25 [00:02<00:31, 1.38s/it] 12%|█▏ | 3/25 [00:03<00:27, 1.25s/it] 16%|█▌ | 4/25 [00:05<00:24, 1.19s/it] 20%|██ | 5/25 [00:06<00:23, 1.15s/it] 24%|██▍ | 6/25 [00:07<00:21, 1.13s/it] 28%|██▊ | 7/25 [00:08<00:20, 1.12s/it] 32%|███▏ | 8/25 [00:09<00:18, 1.11s/it] 36%|███▌ | 9/25 [00:10<00:17, 1.10s/it] 40%|████ | 10/25 [00:11<00:16, 1.10s/it] 44%|████▍ | 11/25 [00:12<00:15, 1.10s/it] 48%|████▊ | 12/25 [00:13<00:14, 1.11s/it] 52%|█████▏ | 13/25 [00:14<00:13, 1.10s/it] 56%|█████▌ | 14/25 [00:16<00:12, 1.10s/it] 60%|██████ | 15/25 [00:17<00:11, 1.11s/it] 64%|██████▍ | 16/25 [00:18<00:09, 1.11s/it] 68%|██████▊ | 17/25 [00:19<00:08, 1.12s/it] 72%|███████▏ | 18/25 [00:20<00:07, 1.12s/it] 76%|███████▌ | 19/25 [00:21<00:06, 1.12s/it] 80%|████████ | 20/25 [00:22<00:05, 1.12s/it] 84%|████████▍ | 21/25 [00:23<00:04, 1.12s/it] 88%|████████▊ | 22/25 [00:25<00:03, 1.13s/it] 92%|█████████▏| 23/25 [00:26<00:02, 1.13s/it] 96%|█████████▌| 24/25 [00:27<00:01, 1.13s/it] 100%|██████████| 25/25 [00:28<00:00, 1.11s/it] 100%|██████████| 25/25 [00:28<00:00, 1.13s/it] 0%| | 0/20 [00:00<?, ?it/s] 5%|▌ | 1/20 [00:00<00:05, 3.61it/s] 10%|█ | 2/20 [00:00<00:04, 3.79it/s] 15%|█▌ | 3/20 [00:00<00:04, 3.88it/s] 20%|██ | 4/20 [00:01<00:04, 3.89it/s] 25%|██▌ | 5/20 [00:01<00:03, 3.98it/s] 30%|███ | 6/20 [00:01<00:03, 4.00it/s] 35%|███▌ | 7/20 [00:01<00:03, 3.99it/s] 40%|████ | 8/20 [00:02<00:03, 3.91it/s] 45%|████▌ | 9/20 [00:02<00:02, 3.99it/s] 50%|█████ | 10/20 [00:02<00:02, 3.77it/s] 55%|█████▌ | 11/20 [00:02<00:02, 3.82it/s] 60%|██████ | 12/20 [00:03<00:02, 3.78it/s] 65%|██████▌ | 13/20 [00:03<00:01, 3.86it/s] 70%|███████ | 14/20 [00:03<00:01, 3.95it/s] 75%|███████▌ | 15/20 [00:03<00:01, 4.00it/s] 80%|████████ | 16/20 [00:04<00:01, 3.81it/s] 85%|████████▌ | 17/20 [00:04<00:00, 3.77it/s] 90%|█████████ | 18/20 [00:04<00:00, 3.81it/s] 95%|█████████▌| 19/20 [00:04<00:00, 3.80it/s] 100%|██████████| 20/20 [00:05<00:00, 3.92it/s] 100%|██████████| 20/20 [00:05<00:00, 3.88it/s] [~] Separating main vocals from backup vocals... 0%| | 0.00/52.8M [00:00<?, ?iB/s] 1%| | 557k/52.8M [00:00<00:09, 5.50MiB/s] 3%|▎ | 1.52M/52.8M [00:00<00:06, 7.87MiB/s] 6%|▌ | 2.92M/52.8M [00:00<00:04, 10.6MiB/s] 10%|▉ | 5.03M/52.8M [00:00<00:03, 14.7MiB/s] 15%|█▌ | 8.09M/52.8M [00:00<00:02, 20.4MiB/s] 24%|██▍ | 12.7M/52.8M [00:00<00:01, 29.2MiB/s] 37%|███▋ | 19.6M/52.8M [00:00<00:00, 42.1MiB/s] 56%|█████▋ | 29.8M/52.8M [00:00<00:00, 60.9MiB/s] 83%|████████▎ | 44.1M/52.8M [00:00<00:00, 86.5MiB/s] 100%|██████████| 52.8M/52.8M [00:00<00:00, 55.4MiB/s] 0%| | 0/26 [00:00<?, ?it/s] 4%|▍ | 1/26 [00:01<00:27, 1.08s/it] 8%|▊ | 2/26 [00:01<00:22, 1.07it/s] 12%|█▏ | 3/26 [00:02<00:20, 1.14it/s] 15%|█▌ | 4/26 [00:03<00:18, 1.18it/s] 19%|█▉ | 5/26 [00:04<00:17, 1.20it/s] 23%|██▎ | 6/26 [00:05<00:16, 1.21it/s] 27%|██▋ | 7/26 [00:05<00:15, 1.22it/s] 31%|███ | 8/26 [00:06<00:14, 1.23it/s] 35%|███▍ | 9/26 [00:07<00:13, 1.23it/s] 38%|███▊ | 10/26 [00:08<00:13, 1.23it/s] 42%|████▏ | 11/26 [00:09<00:12, 1.23it/s] 46%|████▌ | 12/26 [00:10<00:11, 1.23it/s] 50%|█████ | 13/26 [00:10<00:10, 1.23it/s] 54%|█████▍ | 14/26 [00:11<00:09, 1.22it/s] 58%|█████▊ | 15/26 [00:12<00:09, 1.22it/s] 62%|██████▏ | 16/26 [00:13<00:08, 1.23it/s] 65%|██████▌ | 17/26 [00:14<00:07, 1.23it/s] 69%|██████▉ | 18/26 [00:14<00:06, 1.23it/s] 73%|███████▎ | 19/26 [00:15<00:05, 1.23it/s] 77%|███████▋ | 20/26 [00:16<00:04, 1.23it/s] 81%|████████ | 21/26 [00:17<00:04, 1.23it/s] 85%|████████▍ | 22/26 [00:18<00:03, 1.22it/s] 88%|████████▊ | 23/26 [00:18<00:02, 1.23it/s] 92%|█████████▏| 24/26 [00:19<00:01, 1.23it/s] 96%|█████████▌| 25/26 [00:20<00:00, 1.24it/s] 100%|██████████| 26/26 [00:21<00:00, 1.27it/s] 100%|██████████| 26/26 [00:21<00:00, 1.22it/s] 0%| | 0/20 [00:00<?, ?it/s] 5%|▌ | 1/20 [00:00<00:04, 4.41it/s] 10%|█ | 2/20 [00:00<00:04, 4.24it/s] 15%|█▌ | 3/20 [00:00<00:03, 4.30it/s] 20%|██ | 4/20 [00:00<00:03, 4.39it/s] 25%|██▌ | 5/20 [00:01<00:03, 4.37it/s] 30%|███ | 6/20 [00:01<00:03, 4.32it/s] 35%|███▌ | 7/20 [00:01<00:03, 4.33it/s] 40%|████ | 8/20 [00:01<00:02, 4.32it/s] 45%|████▌ | 9/20 [00:02<00:02, 4.32it/s] 50%|█████ | 10/20 [00:02<00:02, 4.35it/s] 55%|█████▌ | 11/20 [00:02<00:02, 4.37it/s] 60%|██████ | 12/20 [00:02<00:01, 4.30it/s] 65%|██████▌ | 13/20 [00:03<00:01, 4.32it/s] 70%|███████ | 14/20 [00:03<00:01, 4.32it/s] 75%|███████▌ | 15/20 [00:03<00:01, 4.25it/s] 80%|████████ | 16/20 [00:03<00:00, 4.14it/s] 85%|████████▌ | 17/20 [00:03<00:00, 4.18it/s] 90%|█████████ | 18/20 [00:04<00:00, 4.26it/s] 95%|█████████▌| 19/20 [00:04<00:00, 4.28it/s] 100%|██████████| 20/20 [00:04<00:00, 4.29it/s] 100%|██████████| 20/20 [00:04<00:00, 4.30it/s] [~] De-reverbing vocals... 0%| | 0.00/66.8M [00:00<?, ?iB/s] 12%|█▏ | 8.04M/66.8M [00:00<00:00, 80.4MiB/s] 45%|████▌ | 30.1M/66.8M [00:00<00:00, 163MiB/s] 80%|████████ | 53.6M/66.8M [00:00<00:00, 196MiB/s] 100%|██████████| 66.8M/66.8M [00:00<00:00, 191MiB/s] 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:37, 1.29it/s] 4%|▍ | 2/50 [00:01<00:31, 1.53it/s] 6%|▌ | 3/50 [00:01<00:29, 1.61it/s] 8%|▊ | 4/50 [00:02<00:28, 1.64it/s] 10%|█ | 5/50 [00:03<00:26, 1.68it/s] 12%|█▏ | 6/50 [00:03<00:26, 1.69it/s] 14%|█▍ | 7/50 [00:04<00:25, 1.70it/s] 16%|█▌ | 8/50 [00:04<00:24, 1.71it/s] 18%|█▊ | 9/50 [00:05<00:23, 1.71it/s] 20%|██ | 10/50 [00:06<00:23, 1.71it/s] 22%|██▏ | 11/50 [00:06<00:22, 1.71it/s] 24%|██▍ | 12/50 [00:07<00:22, 1.72it/s] 26%|██▌ | 13/50 [00:07<00:21, 1.71it/s] 28%|██▊ | 14/50 [00:08<00:21, 1.69it/s] 30%|███ | 15/50 [00:08<00:20, 1.68it/s] 32%|███▏ | 16/50 [00:09<00:20, 1.69it/s] 34%|███▍ | 17/50 [00:10<00:19, 1.69it/s] 36%|███▌ | 18/50 [00:10<00:18, 1.69it/s] 38%|███▊ | 19/50 [00:11<00:18, 1.69it/s] 40%|████ | 20/50 [00:11<00:17, 1.70it/s] 42%|████▏ | 21/50 [00:12<00:17, 1.70it/s] 44%|████▍ | 22/50 [00:13<00:16, 1.69it/s] 46%|████▌ | 23/50 [00:13<00:16, 1.69it/s] 48%|████▊ | 24/50 [00:14<00:15, 1.69it/s] 50%|█████ | 25/50 [00:14<00:14, 1.69it/s] 52%|█████▏ | 26/50 [00:15<00:14, 1.69it/s] 54%|█████▍ | 27/50 [00:16<00:13, 1.69it/s] 56%|█████▌ | 28/50 [00:16<00:13, 1.68it/s] 58%|█████▊ | 29/50 [00:17<00:12, 1.68it/s] 60%|██████ | 30/50 [00:17<00:11, 1.67it/s] 62%|██████▏ | 31/50 [00:18<00:11, 1.67it/s] 64%|██████▍ | 32/50 [00:19<00:10, 1.67it/s] 66%|██████▌ | 33/50 [00:19<00:10, 1.67it/s] 68%|██████▊ | 34/50 [00:20<00:09, 1.66it/s] 70%|███████ | 35/50 [00:20<00:09, 1.66it/s] 72%|███████▏ | 36/50 [00:21<00:08, 1.65it/s] 74%|███████▍ | 37/50 [00:22<00:07, 1.65it/s] 76%|███████▌ | 38/50 [00:22<00:07, 1.65it/s] 78%|███████▊ | 39/50 [00:23<00:06, 1.64it/s] 80%|████████ | 40/50 [00:23<00:06, 1.63it/s] 82%|████████▏ | 41/50 [00:24<00:05, 1.60it/s] 84%|████████▍ | 42/50 [00:25<00:04, 1.61it/s] 86%|████████▌ | 43/50 [00:25<00:04, 1.61it/s] 88%|████████▊ | 44/50 [00:26<00:03, 1.62it/s] 90%|█████████ | 45/50 [00:27<00:03, 1.62it/s] 92%|█████████▏| 46/50 [00:27<00:02, 1.62it/s] 94%|█████████▍| 47/50 [00:28<00:01, 1.62it/s] 96%|█████████▌| 48/50 [00:28<00:01, 1.62it/s] 98%|█████████▊| 49/50 [00:29<00:00, 1.63it/s] 100%|██████████| 50/50 [00:30<00:00, 1.66it/s] 100%|██████████| 50/50 [00:30<00:00, 1.66it/s] 0%| | 0/38 [00:00<?, ?it/s] 3%|▎ | 1/38 [00:00<00:04, 8.44it/s] 5%|▌ | 2/38 [00:00<00:04, 8.25it/s] 8%|▊ | 3/38 [00:00<00:04, 8.48it/s] 11%|█ | 4/38 [00:00<00:03, 8.61it/s] 13%|█▎ | 5/38 [00:00<00:03, 8.58it/s] 16%|█▌ | 6/38 [00:00<00:03, 8.33it/s] 18%|█▊ | 7/38 [00:00<00:03, 8.38it/s] 21%|██ | 8/38 [00:00<00:03, 8.27it/s] 24%|██▎ | 9/38 [00:01<00:03, 8.06it/s] 26%|██▋ | 10/38 [00:01<00:03, 8.07it/s] 29%|██▉ | 11/38 [00:01<00:03, 8.18it/s] 32%|███▏ | 12/38 [00:01<00:03, 8.33it/s] 34%|███▍ | 13/38 [00:01<00:02, 8.40it/s] 37%|███▋ | 14/38 [00:01<00:02, 8.25it/s] 39%|███▉ | 15/38 [00:01<00:02, 8.22it/s] 42%|████▏ | 16/38 [00:01<00:02, 8.03it/s] 45%|████▍ | 17/38 [00:02<00:02, 8.00it/s] 47%|████▋ | 18/38 [00:02<00:02, 7.94it/s] 50%|█████ | 19/38 [00:02<00:02, 8.00it/s] 53%|█████▎ | 20/38 [00:02<00:02, 8.07it/s] 55%|█████▌ | 21/38 [00:02<00:02, 7.89it/s] 58%|█████▊ | 22/38 [00:02<00:02, 7.84it/s] 61%|██████ | 23/38 [00:02<00:01, 7.83it/s] 63%|██████▎ | 24/38 [00:02<00:01, 7.69it/s] 66%|██████▌ | 25/38 [00:03<00:01, 7.49it/s] 68%|██████▊ | 26/38 [00:03<00:01, 7.53it/s] 71%|███████ | 27/38 [00:03<00:01, 7.73it/s] 74%|███████▎ | 28/38 [00:03<00:01, 7.86it/s] 76%|███████▋ | 29/38 [00:03<00:01, 7.67it/s] 79%|███████▉ | 30/38 [00:03<00:01, 6.90it/s] 82%|████████▏ | 31/38 [00:03<00:01, 6.89it/s] 84%|████████▍ | 32/38 [00:04<00:00, 7.15it/s] 87%|████████▋ | 33/38 [00:04<00:00, 7.32it/s] 89%|████████▉ | 34/38 [00:04<00:00, 7.24it/s] 92%|█████████▏| 35/38 [00:04<00:00, 7.37it/s] 95%|█████████▍| 36/38 [00:04<00:00, 7.48it/s] 97%|█████████▋| 37/38 [00:04<00:00, 7.71it/s] 100%|██████████| 38/38 [00:04<00:00, 7.90it/s] 100%|██████████| 38/38 [00:04<00:00, 7.85it/s] [~] Converting vocals... Downloading https://huggingface.co/JackismyShephard/ultimate-rvc/resolve/main/Resources/embedders/contentvec/pytorch_model.bin to /src/models/rvc/embedders/contentvec... Downloading https://huggingface.co/JackismyShephard/ultimate-rvc/resolve/main/Resources/embedders/contentvec/config.json to /src/models/rvc/embedders/contentvec... [~] Post-processing vocals... [~] Pitch-shifting instrumentals... [~] Pitch-shifting backup vocals...
Version Details
- Version ID
5598e8029cbd7e9268db84ce8c2a334eab6ebccbee67b78cf63c38e964379e15- Version Created
- March 4, 2026