zsxkib/create-rvc-dataset 🔊📝 → 🖼️
About
Create your own Realistic Voice Cloning (RVC v2) dataset using a YouTube link

Example Output
Output
Performance Metrics
40.48s
Prediction Time
320.50s
Total Time
All Input Parameters
{ "audio_name": "andrew_huberman", "youtube_url": "https://www.youtube.com/watch?v=4b6bwcWK6GE" }
Input Parameters
- audio_name
- Name of the dataset. The output will be a zip file containing a folder named `dataset/<audio_name>/`. This folder will include multiple `.mp3` files named as `split_<i>.mp3`. Each `split_<i>.mp3` file is a short audio clip extracted from the provided YouTube video, where voice has been isolated from the background noise.
- youtube_url (required)
- URL to YouTube video you'd like to create your RVC v2 dataset from
Output Schema
Output
Example Execution Logs
[youtube] Extracting URL: https://www.youtube.com/watch?v=4b6bwcWK6GE [youtube] 4b6bwcWK6GE: Downloading webpage [youtube] 4b6bwcWK6GE: Downloading ios player API JSON [youtube] 4b6bwcWK6GE: Downloading android player API JSON [youtube] 4b6bwcWK6GE: Downloading m3u8 information [info] 4b6bwcWK6GE: Downloading 1 format(s): 251 [download] Destination: youtubeaudio/andrew_huberman [download] 0.0% of 3.74MiB at Unknown B/s ETA Unknown [download] 0.1% of 3.74MiB at 1.82MiB/s ETA 00:02 [download] 0.2% of 3.74MiB at 2.77MiB/s ETA 00:01 [download] 0.4% of 3.74MiB at 4.12MiB/s ETA 00:00 [download] 0.8% of 3.74MiB at 3.27MiB/s ETA 00:01 [download] 1.6% of 3.74MiB at 3.82MiB/s ETA 00:00 [download] 3.3% of 3.74MiB at 4.15MiB/s ETA 00:00 [download] 6.7% of 3.74MiB at 6.05MiB/s ETA 00:00 [download] 13.3% of 3.74MiB at 9.41MiB/s ETA 00:00 [download] 26.7% of 3.74MiB at 13.75MiB/s ETA 00:00 [download] 53.4% of 3.74MiB at 25.34MiB/s ETA 00:00 [download] 100.0% of 3.74MiB at 38.68MiB/s ETA 00:00 [download] 100% of 3.74MiB in 00:00:00 at 17.54MiB/s [ExtractAudio] Destination: youtubeaudio/andrew_huberman.wav Deleting original file youtubeaudio/andrew_huberman (pass -k to keep) Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th 0%| | 0.00/80.2M [00:00<?, ?B/s] 8%|▊ | 6.13M/80.2M [00:00<00:01, 64.3MB/s] 22%|██▏ | 17.8M/80.2M [00:00<00:00, 98.4MB/s] 40%|████ | 32.4M/80.2M [00:00<00:00, 123MB/s] 60%|██████ | 48.2M/80.2M [00:00<00:00, 140MB/s] 77%|███████▋ | 61.6M/80.2M [00:00<00:00, 140MB/s] 95%|█████████▌| 76.6M/80.2M [00:00<00:00, 146MB/s] 100%|██████████| 80.2M/80.2M [00:00<00:00, 135MB/s] 0%| | 0.0/263.25 [00:00<?, ?seconds/s] 2%|█▌ | 5.85/263.25 [00:07<05:31, 1.29s/seconds] 4%|███▏ | 11.7/263.25 [00:07<02:18, 1.81seconds/s] 7%|███▊ | 17.549999999999997/263.25 [00:07<01:17, 3.15seconds/s] 9%|██████▍ | 23.4/263.25 [00:08<00:49, 4.83seconds/s] 11%|███████▉ | 29.25/263.25 [00:08<00:34, 6.84seconds/s] 13%|███████▋ | 35.099999999999994/263.25 [00:08<00:25, 9.12seconds/s] 16%|█████████ | 40.949999999999996/263.25 [00:08<00:19, 11.60seconds/s] 18%|████████████▊ | 46.8/263.25 [00:09<00:15, 14.07seconds/s] 20%|██████████████▏ | 52.65/263.25 [00:09<00:12, 16.34seconds/s] 22%|████████████████ | 58.5/263.25 [00:09<00:11, 18.53seconds/s] 24%|█████████████████▎ | 64.35/263.25 [00:09<00:09, 20.37seconds/s] 27%|███████████████▋ | 70.19999999999999/263.25 [00:09<00:08, 21.78seconds/s] 29%|████████████████████▌ | 76.05/263.25 [00:10<00:08, 23.01seconds/s] 31%|██████████████████▎ | 81.89999999999999/263.25 [00:10<00:07, 23.88seconds/s] 33%|███████████████████████▋ | 87.75/263.25 [00:10<00:07, 24.65seconds/s] 36%|█████████████████████████▌ | 93.6/263.25 [00:10<00:06, 25.14seconds/s] 38%|██████████████████████▎ | 99.44999999999999/263.25 [00:11<00:06, 25.48seconds/s] 40%|████████████████████████████▍ | 105.3/263.25 [00:11<00:06, 25.72seconds/s] 42%|████████████████████████▍ | 111.14999999999999/263.25 [00:11<00:05, 25.89seconds/s] 44%|███████████████████████████████▌ | 117.0/263.25 [00:11<00:05, 26.00seconds/s] 47%|████████████████████████████████▋ | 122.85/263.25 [00:11<00:05, 26.01seconds/s] 49%|██████████████████████████████████▋ | 128.7/263.25 [00:12<00:05, 26.10seconds/s] 51%|█████████████████████████████▋ | 134.54999999999998/263.25 [00:12<00:04, 25.99seconds/s] 53%|██████████████████████████████▉ | 140.39999999999998/263.25 [00:12<00:04, 26.06seconds/s] 56%|██████████████████████████████████████▉ | 146.25/263.25 [00:12<00:04, 26.18seconds/s] 58%|█████████████████████████████████████████ | 152.1/263.25 [00:13<00:04, 26.14seconds/s] 60%|██████████████████████████████████████████ | 157.95/263.25 [00:13<00:04, 26.20seconds/s] 62%|████████████████████████████████████ | 163.79999999999998/263.25 [00:13<00:03, 26.25seconds/s] 64%|█████████████████████████████████████▍ | 169.64999999999998/263.25 [00:13<00:03, 26.03seconds/s] 67%|███████████████████████████████████████████████▎ | 175.5/263.25 [00:14<00:03, 26.09seconds/s] 69%|████████████████████████████████████████████████▏ | 181.35/263.25 [00:14<00:03, 26.09seconds/s] 71%|██████████████████████████████████████████████████▍ | 187.2/263.25 [00:14<00:02, 26.24seconds/s] 73%|██████████████████████████████████████████▌ | 193.04999999999998/263.25 [00:14<00:02, 26.35seconds/s] 76%|███████████████████████████████████████████▊ | 198.89999999999998/263.25 [00:14<00:02, 26.32seconds/s] 78%|██████████████████████████████████████████████████████▍ | 204.75/263.25 [00:15<00:02, 26.37seconds/s] 80%|████████████████████████████████████████████████████████▊ | 210.6/263.25 [00:15<00:01, 26.42seconds/s] 82%|█████████████████████████████████████████████████████████▌ | 216.45/263.25 [00:15<00:01, 26.38seconds/s] 84%|████████████████████████████████████████████████▉ | 222.29999999999998/263.25 [00:15<00:01, 26.35seconds/s] 87%|██████████████████████████████████████████████████▎ | 228.14999999999998/263.25 [00:16<00:01, 26.37seconds/s] 89%|███████████████████████████████████████████████████████████████ | 234.0/263.25 [00:16<00:01, 26.20seconds/s] 91%|███████████████████████████████████████████████████████████████▊ | 239.85/263.25 [00:16<00:00, 26.18seconds/s] 93%|██████████████████████████████████████████████████████████████████▎ | 245.7/263.25 [00:16<00:00, 26.16seconds/s] 96%|███████████████████████████████████████████████████████▍ | 251.54999999999998/263.25 [00:16<00:00, 26.18seconds/s] 98%|█████████████████████████████████████████████████████████████████████▍ | 257.4/263.25 [00:17<00:00, 26.22seconds/s] 100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:17<00:00, 26.22seconds/s] 100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:17<00:00, 15.18seconds/s] [1mImportant: the default model was recently changed to `htdemucs`[0m the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`. Selected model is a bag of 1 models. You will see that many progress bars per track. Separated tracks will be stored in /src/separated/htdemucs Separating track youtubeaudio/andrew_huberman.wav
Version Details
- Version ID
c445e27ff34574e92781c15c67db41835cedcdc27a19f527a7dcf37bd0ffe1ff
- Version Created
- November 20, 2023