meta-innovation/ultimate_rvc 🔢❓🖼️✓ → 🖼️

▶️ 98 runs 📅 Mar 2026 ⚙️ Cog 0.16.12 🔗 GitHub ⚖️ License
audio-to-audio singing-voice-conversion voice-conversion

About

An extension of AiCoverGen, which provides several new features and improvements, enabling users to generate audio-related content using RVC with ease. Ideal for people who want to incorporate singing functionality into their AI assistant/chatbot/vtuber,

Example Output

Output

Example output

Performance Metrics

173.74s Prediction Time
376.26s Total Time
All Input Parameters
{
  "pitch": 0,
  "dry_level": 0.8,
  "f0_method": "rmvpe",
  "inst_gain": 0,
  "main_gain": 0,
  "room_size": 0.15,
  "rvc_model": "https://huggingface.co/bennetJL/TaylorSwift/resolve/main/TaylorSwift2024.zip",
  "wet_level": 0.2,
  "index_rate": 0.51,
  "song_input": "https://cdn-melody-craft.artvibe.ai/mp3/2026/01/2009539268451794944.mp3",
  "backup_gain": 0,
  "clean_vocals": true,
  "protect_rate": 0.13,
  "rms_mix_rate": 1,
  "split_vocals": true,
  "output_format": "mp3",
  "autotune_vocals": true
}
Input Parameters
pitch Type: integerDefault: 0Range: -24 - 24
Semitone pitch shift for the converted vocals. Positive values raise pitch, negative values lower it.
dry_level Type: numberDefault: 0.8Range: 0 - 1
Dry level of the reverb effect on converted vocals (0–1).
f0_method Default: rmvpe
Pitch extraction method used during vocal conversion.
inst_gain Type: integerDefault: 0Range: -20 - 20
Volume gain (dB) for the instrumental track.
main_gain Type: integerDefault: 0Range: -20 - 20
Volume gain (dB) for the converted vocal track.
room_size Type: numberDefault: 0.15Range: 0 - 1
Room size of the reverb effect on converted vocals (0–1).
rvc_model (required) Type: string
Zip archive containing your RVC voice model. Must include a .pth file and optionally a .index file.
wet_level Type: numberDefault: 0.2Range: 0 - 1
Wet level of the reverb effect on converted vocals (0–1).
index_rate Type: numberDefault: 0.3Range: 0 - 1
Influence of the .index file on vocal conversion (0–1). Higher values make output closer to the voice model.
song_input (required) Type: string
Audio file to convert. Upload a file directly or provide a URL (e.g. https://example.com/song.mp3). Supported formats: mp3, wav, flac, ogg, m4a, aac.
backup_gain Type: integerDefault: 0Range: -20 - 20
Volume gain (dB) for the backup vocals track.
clean_vocals Type: booleanDefault: false
Apply noise reduction to the converted vocals.
protect_rate Type: numberDefault: 0.33Range: 0 - 0.5
Protection strength for consonants and breathing sounds (0–0.5). Lower values protect more.
rms_mix_rate Type: numberDefault: 1Range: 0 - 1
Blending rate of the volume envelope of the converted vocals with the original (0–1).
split_vocals Type: booleanDefault: false
Split audio into segments before vocal conversion.
output_format Default: mp3
Format of the output audio file.
autotune_vocals Type: booleanDefault: false
Apply autotune to the converted vocals.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
[~] Retrieving song...
[~] Separating vocals from instrumentals...
  0%|          | 0.00/3.54k [00:00<?, ?iB/s]
28.3kiB [00:00, 40.8MiB/s]
  0%|          | 0.00/66.8M [00:00<?, ?iB/s]
  1%|          | 606k/66.8M [00:00<00:10, 6.04MiB/s]
  2%|▏         | 1.65M/66.8M [00:00<00:07, 8.61MiB/s]
  5%|▍         | 3.24M/66.8M [00:00<00:05, 11.9MiB/s]
  8%|▊         | 5.62M/66.8M [00:00<00:03, 16.5MiB/s]
 14%|█▍        | 9.24M/66.8M [00:00<00:02, 23.6MiB/s]
 22%|██▏       | 14.5M/66.8M [00:00<00:01, 33.3MiB/s]
 33%|███▎      | 22.3M/66.8M [00:00<00:00, 48.0MiB/s]
 51%|█████     | 33.9M/66.8M [00:00<00:00, 69.5MiB/s]
 76%|███████▋  | 51.1M/66.8M [00:00<00:00, 101MiB/s] 
100%|██████████| 66.8M/66.8M [00:00<00:00, 67.7MiB/s]
  0%|          | 0.00/1.00k [00:00<?, ?iB/s]
4.38kiB [00:00, 9.68MiB/s]
  0%|          | 0.00/2.77k [00:00<?, ?iB/s]
15.4kiB [00:00, 28.5MiB/s]
  0%|          | 0/25 [00:00<?, ?it/s]
  4%|▍         | 1/25 [00:01<00:43,  1.80s/it]
  8%|▊         | 2/25 [00:02<00:31,  1.38s/it]
 12%|█▏        | 3/25 [00:03<00:27,  1.25s/it]
 16%|█▌        | 4/25 [00:05<00:24,  1.19s/it]
 20%|██        | 5/25 [00:06<00:23,  1.15s/it]
 24%|██▍       | 6/25 [00:07<00:21,  1.13s/it]
 28%|██▊       | 7/25 [00:08<00:20,  1.12s/it]
 32%|███▏      | 8/25 [00:09<00:18,  1.11s/it]
 36%|███▌      | 9/25 [00:10<00:17,  1.10s/it]
 40%|████      | 10/25 [00:11<00:16,  1.10s/it]
 44%|████▍     | 11/25 [00:12<00:15,  1.10s/it]
 48%|████▊     | 12/25 [00:13<00:14,  1.11s/it]
 52%|█████▏    | 13/25 [00:14<00:13,  1.10s/it]
 56%|█████▌    | 14/25 [00:16<00:12,  1.10s/it]
 60%|██████    | 15/25 [00:17<00:11,  1.11s/it]
 64%|██████▍   | 16/25 [00:18<00:09,  1.11s/it]
 68%|██████▊   | 17/25 [00:19<00:08,  1.12s/it]
 72%|███████▏  | 18/25 [00:20<00:07,  1.12s/it]
 76%|███████▌  | 19/25 [00:21<00:06,  1.12s/it]
 80%|████████  | 20/25 [00:22<00:05,  1.12s/it]
 84%|████████▍ | 21/25 [00:23<00:04,  1.12s/it]
 88%|████████▊ | 22/25 [00:25<00:03,  1.13s/it]
 92%|█████████▏| 23/25 [00:26<00:02,  1.13s/it]
 96%|█████████▌| 24/25 [00:27<00:01,  1.13s/it]
100%|██████████| 25/25 [00:28<00:00,  1.11s/it]
100%|██████████| 25/25 [00:28<00:00,  1.13s/it]
  0%|          | 0/20 [00:00<?, ?it/s]
  5%|▌         | 1/20 [00:00<00:05,  3.61it/s]
 10%|█         | 2/20 [00:00<00:04,  3.79it/s]
 15%|█▌        | 3/20 [00:00<00:04,  3.88it/s]
 20%|██        | 4/20 [00:01<00:04,  3.89it/s]
 25%|██▌       | 5/20 [00:01<00:03,  3.98it/s]
 30%|███       | 6/20 [00:01<00:03,  4.00it/s]
 35%|███▌      | 7/20 [00:01<00:03,  3.99it/s]
 40%|████      | 8/20 [00:02<00:03,  3.91it/s]
 45%|████▌     | 9/20 [00:02<00:02,  3.99it/s]
 50%|█████     | 10/20 [00:02<00:02,  3.77it/s]
 55%|█████▌    | 11/20 [00:02<00:02,  3.82it/s]
 60%|██████    | 12/20 [00:03<00:02,  3.78it/s]
 65%|██████▌   | 13/20 [00:03<00:01,  3.86it/s]
 70%|███████   | 14/20 [00:03<00:01,  3.95it/s]
 75%|███████▌  | 15/20 [00:03<00:01,  4.00it/s]
 80%|████████  | 16/20 [00:04<00:01,  3.81it/s]
 85%|████████▌ | 17/20 [00:04<00:00,  3.77it/s]
 90%|█████████ | 18/20 [00:04<00:00,  3.81it/s]
 95%|█████████▌| 19/20 [00:04<00:00,  3.80it/s]
100%|██████████| 20/20 [00:05<00:00,  3.92it/s]
100%|██████████| 20/20 [00:05<00:00,  3.88it/s]
[~] Separating main vocals from backup vocals...
  0%|          | 0.00/52.8M [00:00<?, ?iB/s]
  1%|          | 557k/52.8M [00:00<00:09, 5.50MiB/s]
  3%|▎         | 1.52M/52.8M [00:00<00:06, 7.87MiB/s]
  6%|▌         | 2.92M/52.8M [00:00<00:04, 10.6MiB/s]
 10%|▉         | 5.03M/52.8M [00:00<00:03, 14.7MiB/s]
 15%|█▌        | 8.09M/52.8M [00:00<00:02, 20.4MiB/s]
 24%|██▍       | 12.7M/52.8M [00:00<00:01, 29.2MiB/s]
 37%|███▋      | 19.6M/52.8M [00:00<00:00, 42.1MiB/s]
 56%|█████▋    | 29.8M/52.8M [00:00<00:00, 60.9MiB/s]
 83%|████████▎ | 44.1M/52.8M [00:00<00:00, 86.5MiB/s]
100%|██████████| 52.8M/52.8M [00:00<00:00, 55.4MiB/s]
  0%|          | 0/26 [00:00<?, ?it/s]
  4%|▍         | 1/26 [00:01<00:27,  1.08s/it]
  8%|▊         | 2/26 [00:01<00:22,  1.07it/s]
 12%|█▏        | 3/26 [00:02<00:20,  1.14it/s]
 15%|█▌        | 4/26 [00:03<00:18,  1.18it/s]
 19%|█▉        | 5/26 [00:04<00:17,  1.20it/s]
 23%|██▎       | 6/26 [00:05<00:16,  1.21it/s]
 27%|██▋       | 7/26 [00:05<00:15,  1.22it/s]
 31%|███       | 8/26 [00:06<00:14,  1.23it/s]
 35%|███▍      | 9/26 [00:07<00:13,  1.23it/s]
 38%|███▊      | 10/26 [00:08<00:13,  1.23it/s]
 42%|████▏     | 11/26 [00:09<00:12,  1.23it/s]
 46%|████▌     | 12/26 [00:10<00:11,  1.23it/s]
 50%|█████     | 13/26 [00:10<00:10,  1.23it/s]
 54%|█████▍    | 14/26 [00:11<00:09,  1.22it/s]
 58%|█████▊    | 15/26 [00:12<00:09,  1.22it/s]
 62%|██████▏   | 16/26 [00:13<00:08,  1.23it/s]
 65%|██████▌   | 17/26 [00:14<00:07,  1.23it/s]
 69%|██████▉   | 18/26 [00:14<00:06,  1.23it/s]
 73%|███████▎  | 19/26 [00:15<00:05,  1.23it/s]
 77%|███████▋  | 20/26 [00:16<00:04,  1.23it/s]
 81%|████████  | 21/26 [00:17<00:04,  1.23it/s]
 85%|████████▍ | 22/26 [00:18<00:03,  1.22it/s]
 88%|████████▊ | 23/26 [00:18<00:02,  1.23it/s]
 92%|█████████▏| 24/26 [00:19<00:01,  1.23it/s]
 96%|█████████▌| 25/26 [00:20<00:00,  1.24it/s]
100%|██████████| 26/26 [00:21<00:00,  1.27it/s]
100%|██████████| 26/26 [00:21<00:00,  1.22it/s]
  0%|          | 0/20 [00:00<?, ?it/s]
  5%|▌         | 1/20 [00:00<00:04,  4.41it/s]
 10%|█         | 2/20 [00:00<00:04,  4.24it/s]
 15%|█▌        | 3/20 [00:00<00:03,  4.30it/s]
 20%|██        | 4/20 [00:00<00:03,  4.39it/s]
 25%|██▌       | 5/20 [00:01<00:03,  4.37it/s]
 30%|███       | 6/20 [00:01<00:03,  4.32it/s]
 35%|███▌      | 7/20 [00:01<00:03,  4.33it/s]
 40%|████      | 8/20 [00:01<00:02,  4.32it/s]
 45%|████▌     | 9/20 [00:02<00:02,  4.32it/s]
 50%|█████     | 10/20 [00:02<00:02,  4.35it/s]
 55%|█████▌    | 11/20 [00:02<00:02,  4.37it/s]
 60%|██████    | 12/20 [00:02<00:01,  4.30it/s]
 65%|██████▌   | 13/20 [00:03<00:01,  4.32it/s]
 70%|███████   | 14/20 [00:03<00:01,  4.32it/s]
 75%|███████▌  | 15/20 [00:03<00:01,  4.25it/s]
 80%|████████  | 16/20 [00:03<00:00,  4.14it/s]
 85%|████████▌ | 17/20 [00:03<00:00,  4.18it/s]
 90%|█████████ | 18/20 [00:04<00:00,  4.26it/s]
 95%|█████████▌| 19/20 [00:04<00:00,  4.28it/s]
100%|██████████| 20/20 [00:04<00:00,  4.29it/s]
100%|██████████| 20/20 [00:04<00:00,  4.30it/s]
[~] De-reverbing vocals...
  0%|          | 0.00/66.8M [00:00<?, ?iB/s]
 12%|█▏        | 8.04M/66.8M [00:00<00:00, 80.4MiB/s]
 45%|████▌     | 30.1M/66.8M [00:00<00:00, 163MiB/s] 
 80%|████████  | 53.6M/66.8M [00:00<00:00, 196MiB/s]
100%|██████████| 66.8M/66.8M [00:00<00:00, 191MiB/s]
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:37,  1.29it/s]
  4%|▍         | 2/50 [00:01<00:31,  1.53it/s]
  6%|▌         | 3/50 [00:01<00:29,  1.61it/s]
  8%|▊         | 4/50 [00:02<00:28,  1.64it/s]
 10%|█         | 5/50 [00:03<00:26,  1.68it/s]
 12%|█▏        | 6/50 [00:03<00:26,  1.69it/s]
 14%|█▍        | 7/50 [00:04<00:25,  1.70it/s]
 16%|█▌        | 8/50 [00:04<00:24,  1.71it/s]
 18%|█▊        | 9/50 [00:05<00:23,  1.71it/s]
 20%|██        | 10/50 [00:06<00:23,  1.71it/s]
 22%|██▏       | 11/50 [00:06<00:22,  1.71it/s]
 24%|██▍       | 12/50 [00:07<00:22,  1.72it/s]
 26%|██▌       | 13/50 [00:07<00:21,  1.71it/s]
 28%|██▊       | 14/50 [00:08<00:21,  1.69it/s]
 30%|███       | 15/50 [00:08<00:20,  1.68it/s]
 32%|███▏      | 16/50 [00:09<00:20,  1.69it/s]
 34%|███▍      | 17/50 [00:10<00:19,  1.69it/s]
 36%|███▌      | 18/50 [00:10<00:18,  1.69it/s]
 38%|███▊      | 19/50 [00:11<00:18,  1.69it/s]
 40%|████      | 20/50 [00:11<00:17,  1.70it/s]
 42%|████▏     | 21/50 [00:12<00:17,  1.70it/s]
 44%|████▍     | 22/50 [00:13<00:16,  1.69it/s]
 46%|████▌     | 23/50 [00:13<00:16,  1.69it/s]
 48%|████▊     | 24/50 [00:14<00:15,  1.69it/s]
 50%|█████     | 25/50 [00:14<00:14,  1.69it/s]
 52%|█████▏    | 26/50 [00:15<00:14,  1.69it/s]
 54%|█████▍    | 27/50 [00:16<00:13,  1.69it/s]
 56%|█████▌    | 28/50 [00:16<00:13,  1.68it/s]
 58%|█████▊    | 29/50 [00:17<00:12,  1.68it/s]
 60%|██████    | 30/50 [00:17<00:11,  1.67it/s]
 62%|██████▏   | 31/50 [00:18<00:11,  1.67it/s]
 64%|██████▍   | 32/50 [00:19<00:10,  1.67it/s]
 66%|██████▌   | 33/50 [00:19<00:10,  1.67it/s]
 68%|██████▊   | 34/50 [00:20<00:09,  1.66it/s]
 70%|███████   | 35/50 [00:20<00:09,  1.66it/s]
 72%|███████▏  | 36/50 [00:21<00:08,  1.65it/s]
 74%|███████▍  | 37/50 [00:22<00:07,  1.65it/s]
 76%|███████▌  | 38/50 [00:22<00:07,  1.65it/s]
 78%|███████▊  | 39/50 [00:23<00:06,  1.64it/s]
 80%|████████  | 40/50 [00:23<00:06,  1.63it/s]
 82%|████████▏ | 41/50 [00:24<00:05,  1.60it/s]
 84%|████████▍ | 42/50 [00:25<00:04,  1.61it/s]
 86%|████████▌ | 43/50 [00:25<00:04,  1.61it/s]
 88%|████████▊ | 44/50 [00:26<00:03,  1.62it/s]
 90%|█████████ | 45/50 [00:27<00:03,  1.62it/s]
 92%|█████████▏| 46/50 [00:27<00:02,  1.62it/s]
 94%|█████████▍| 47/50 [00:28<00:01,  1.62it/s]
 96%|█████████▌| 48/50 [00:28<00:01,  1.62it/s]
 98%|█████████▊| 49/50 [00:29<00:00,  1.63it/s]
100%|██████████| 50/50 [00:30<00:00,  1.66it/s]
100%|██████████| 50/50 [00:30<00:00,  1.66it/s]
  0%|          | 0/38 [00:00<?, ?it/s]
  3%|▎         | 1/38 [00:00<00:04,  8.44it/s]
  5%|▌         | 2/38 [00:00<00:04,  8.25it/s]
  8%|▊         | 3/38 [00:00<00:04,  8.48it/s]
 11%|█         | 4/38 [00:00<00:03,  8.61it/s]
 13%|█▎        | 5/38 [00:00<00:03,  8.58it/s]
 16%|█▌        | 6/38 [00:00<00:03,  8.33it/s]
 18%|█▊        | 7/38 [00:00<00:03,  8.38it/s]
 21%|██        | 8/38 [00:00<00:03,  8.27it/s]
 24%|██▎       | 9/38 [00:01<00:03,  8.06it/s]
 26%|██▋       | 10/38 [00:01<00:03,  8.07it/s]
 29%|██▉       | 11/38 [00:01<00:03,  8.18it/s]
 32%|███▏      | 12/38 [00:01<00:03,  8.33it/s]
 34%|███▍      | 13/38 [00:01<00:02,  8.40it/s]
 37%|███▋      | 14/38 [00:01<00:02,  8.25it/s]
 39%|███▉      | 15/38 [00:01<00:02,  8.22it/s]
 42%|████▏     | 16/38 [00:01<00:02,  8.03it/s]
 45%|████▍     | 17/38 [00:02<00:02,  8.00it/s]
 47%|████▋     | 18/38 [00:02<00:02,  7.94it/s]
 50%|█████     | 19/38 [00:02<00:02,  8.00it/s]
 53%|█████▎    | 20/38 [00:02<00:02,  8.07it/s]
 55%|█████▌    | 21/38 [00:02<00:02,  7.89it/s]
 58%|█████▊    | 22/38 [00:02<00:02,  7.84it/s]
 61%|██████    | 23/38 [00:02<00:01,  7.83it/s]
 63%|██████▎   | 24/38 [00:02<00:01,  7.69it/s]
 66%|██████▌   | 25/38 [00:03<00:01,  7.49it/s]
 68%|██████▊   | 26/38 [00:03<00:01,  7.53it/s]
 71%|███████   | 27/38 [00:03<00:01,  7.73it/s]
 74%|███████▎  | 28/38 [00:03<00:01,  7.86it/s]
 76%|███████▋  | 29/38 [00:03<00:01,  7.67it/s]
 79%|███████▉  | 30/38 [00:03<00:01,  6.90it/s]
 82%|████████▏ | 31/38 [00:03<00:01,  6.89it/s]
 84%|████████▍ | 32/38 [00:04<00:00,  7.15it/s]
 87%|████████▋ | 33/38 [00:04<00:00,  7.32it/s]
 89%|████████▉ | 34/38 [00:04<00:00,  7.24it/s]
 92%|█████████▏| 35/38 [00:04<00:00,  7.37it/s]
 95%|█████████▍| 36/38 [00:04<00:00,  7.48it/s]
 97%|█████████▋| 37/38 [00:04<00:00,  7.71it/s]
100%|██████████| 38/38 [00:04<00:00,  7.90it/s]
100%|██████████| 38/38 [00:04<00:00,  7.85it/s]
[~] Converting vocals...
Downloading https://huggingface.co/JackismyShephard/ultimate-rvc/resolve/main/Resources/embedders/contentvec/pytorch_model.bin to /src/models/rvc/embedders/contentvec...
Downloading https://huggingface.co/JackismyShephard/ultimate-rvc/resolve/main/Resources/embedders/contentvec/config.json to /src/models/rvc/embedders/contentvec...
[~] Post-processing vocals...
[~] Pitch-shifting instrumentals...
[~] Pitch-shifting backup vocals...
Version Details
Version ID
5598e8029cbd7e9268db84ce8c2a334eab6ebccbee67b78cf63c38e964379e15
Version Created
March 4, 2026
Run on Replicate →