zsxkib/create-rvc-dataset 🔊📝 → 🖼️

▶️ 15.4K runs 📅 Nov 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper
dataset-preparation music-source-separation voice-cloning

About

Create your own Realistic Voice Cloning (RVC v2) dataset using a YouTube link

Example Output

Output

Example output

Performance Metrics

40.48s Prediction Time
320.50s Total Time
All Input Parameters
{
  "audio_name": "andrew_huberman",
  "youtube_url": "https://www.youtube.com/watch?v=4b6bwcWK6GE"
}
Input Parameters
audio_name Type: stringDefault: rvc_v2_voices
Name of the dataset. The output will be a zip file containing a folder named `dataset/<audio_name>/`. This folder will include multiple `.mp3` files named as `split_<i>.mp3`. Each `split_<i>.mp3` file is a short audio clip extracted from the provided YouTube video, where voice has been isolated from the background noise.
youtube_url (required) Type: string
URL to YouTube video you'd like to create your RVC v2 dataset from
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
[youtube] Extracting URL: https://www.youtube.com/watch?v=4b6bwcWK6GE
[youtube] 4b6bwcWK6GE: Downloading webpage
[youtube] 4b6bwcWK6GE: Downloading ios player API JSON
[youtube] 4b6bwcWK6GE: Downloading android player API JSON
[youtube] 4b6bwcWK6GE: Downloading m3u8 information
[info] 4b6bwcWK6GE: Downloading 1 format(s): 251
[download] Destination: youtubeaudio/andrew_huberman
[download]   0.0% of    3.74MiB at  Unknown B/s ETA Unknown
[download]   0.1% of    3.74MiB at    1.82MiB/s ETA 00:02
[download]   0.2% of    3.74MiB at    2.77MiB/s ETA 00:01
[download]   0.4% of    3.74MiB at    4.12MiB/s ETA 00:00
[download]   0.8% of    3.74MiB at    3.27MiB/s ETA 00:01
[download]   1.6% of    3.74MiB at    3.82MiB/s ETA 00:00
[download]   3.3% of    3.74MiB at    4.15MiB/s ETA 00:00
[download]   6.7% of    3.74MiB at    6.05MiB/s ETA 00:00
[download]  13.3% of    3.74MiB at    9.41MiB/s ETA 00:00
[download]  26.7% of    3.74MiB at   13.75MiB/s ETA 00:00
[download]  53.4% of    3.74MiB at   25.34MiB/s ETA 00:00
[download] 100.0% of    3.74MiB at   38.68MiB/s ETA 00:00
[download] 100% of    3.74MiB in 00:00:00 at 17.54MiB/s
[ExtractAudio] Destination: youtubeaudio/andrew_huberman.wav
Deleting original file youtubeaudio/andrew_huberman (pass -k to keep)
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
  0%|          | 0.00/80.2M [00:00<?, ?B/s]
  8%|▊         | 6.13M/80.2M [00:00<00:01, 64.3MB/s]
 22%|██▏       | 17.8M/80.2M [00:00<00:00, 98.4MB/s]
 40%|████      | 32.4M/80.2M [00:00<00:00, 123MB/s] 
 60%|██████    | 48.2M/80.2M [00:00<00:00, 140MB/s]
 77%|███████▋  | 61.6M/80.2M [00:00<00:00, 140MB/s]
 95%|█████████▌| 76.6M/80.2M [00:00<00:00, 146MB/s]
100%|██████████| 80.2M/80.2M [00:00<00:00, 135MB/s]
  0%|                                                                                 | 0.0/263.25 [00:00<?, ?seconds/s]
  2%|█▌                                                                      | 5.85/263.25 [00:07<05:31,  1.29s/seconds]
  4%|███▏                                                                    | 11.7/263.25 [00:07<02:18,  1.81seconds/s]
  7%|███▊                                                      | 17.549999999999997/263.25 [00:07<01:17,  3.15seconds/s]
  9%|██████▍                                                                 | 23.4/263.25 [00:08<00:49,  4.83seconds/s]
 11%|███████▉                                                               | 29.25/263.25 [00:08<00:34,  6.84seconds/s]
 13%|███████▋                                                  | 35.099999999999994/263.25 [00:08<00:25,  9.12seconds/s]
 16%|█████████                                                 | 40.949999999999996/263.25 [00:08<00:19, 11.60seconds/s]
 18%|████████████▊                                                           | 46.8/263.25 [00:09<00:15, 14.07seconds/s]
 20%|██████████████▏                                                        | 52.65/263.25 [00:09<00:12, 16.34seconds/s]
 22%|████████████████                                                        | 58.5/263.25 [00:09<00:11, 18.53seconds/s]
 24%|█████████████████▎                                                     | 64.35/263.25 [00:09<00:09, 20.37seconds/s]
 27%|███████████████▋                                           | 70.19999999999999/263.25 [00:09<00:08, 21.78seconds/s]
 29%|████████████████████▌                                                  | 76.05/263.25 [00:10<00:08, 23.01seconds/s]
 31%|██████████████████▎                                        | 81.89999999999999/263.25 [00:10<00:07, 23.88seconds/s]
 33%|███████████████████████▋                                               | 87.75/263.25 [00:10<00:07, 24.65seconds/s]
 36%|█████████████████████████▌                                              | 93.6/263.25 [00:10<00:06, 25.14seconds/s]
 38%|██████████████████████▎                                    | 99.44999999999999/263.25 [00:11<00:06, 25.48seconds/s]
 40%|████████████████████████████▍                                          | 105.3/263.25 [00:11<00:06, 25.72seconds/s]
 42%|████████████████████████▍                                 | 111.14999999999999/263.25 [00:11<00:05, 25.89seconds/s]
 44%|███████████████████████████████▌                                       | 117.0/263.25 [00:11<00:05, 26.00seconds/s]
 47%|████████████████████████████████▋                                     | 122.85/263.25 [00:11<00:05, 26.01seconds/s]
 49%|██████████████████████████████████▋                                    | 128.7/263.25 [00:12<00:05, 26.10seconds/s]
 51%|█████████████████████████████▋                            | 134.54999999999998/263.25 [00:12<00:04, 25.99seconds/s]
 53%|██████████████████████████████▉                           | 140.39999999999998/263.25 [00:12<00:04, 26.06seconds/s]
 56%|██████████████████████████████████████▉                               | 146.25/263.25 [00:12<00:04, 26.18seconds/s]
 58%|█████████████████████████████████████████                              | 152.1/263.25 [00:13<00:04, 26.14seconds/s]
 60%|██████████████████████████████████████████                            | 157.95/263.25 [00:13<00:04, 26.20seconds/s]
 62%|████████████████████████████████████                      | 163.79999999999998/263.25 [00:13<00:03, 26.25seconds/s]
 64%|█████████████████████████████████████▍                    | 169.64999999999998/263.25 [00:13<00:03, 26.03seconds/s]
 67%|███████████████████████████████████████████████▎                       | 175.5/263.25 [00:14<00:03, 26.09seconds/s]
 69%|████████████████████████████████████████████████▏                     | 181.35/263.25 [00:14<00:03, 26.09seconds/s]
 71%|██████████████████████████████████████████████████▍                    | 187.2/263.25 [00:14<00:02, 26.24seconds/s]
 73%|██████████████████████████████████████████▌               | 193.04999999999998/263.25 [00:14<00:02, 26.35seconds/s]
 76%|███████████████████████████████████████████▊              | 198.89999999999998/263.25 [00:14<00:02, 26.32seconds/s]
 78%|██████████████████████████████████████████████████████▍               | 204.75/263.25 [00:15<00:02, 26.37seconds/s]
 80%|████████████████████████████████████████████████████████▊              | 210.6/263.25 [00:15<00:01, 26.42seconds/s]
 82%|█████████████████████████████████████████████████████████▌            | 216.45/263.25 [00:15<00:01, 26.38seconds/s]
 84%|████████████████████████████████████████████████▉         | 222.29999999999998/263.25 [00:15<00:01, 26.35seconds/s]
 87%|██████████████████████████████████████████████████▎       | 228.14999999999998/263.25 [00:16<00:01, 26.37seconds/s]
 89%|███████████████████████████████████████████████████████████████        | 234.0/263.25 [00:16<00:01, 26.20seconds/s]
 91%|███████████████████████████████████████████████████████████████▊      | 239.85/263.25 [00:16<00:00, 26.18seconds/s]
 93%|██████████████████████████████████████████████████████████████████▎    | 245.7/263.25 [00:16<00:00, 26.16seconds/s]
 96%|███████████████████████████████████████████████████████▍  | 251.54999999999998/263.25 [00:16<00:00, 26.18seconds/s]
 98%|█████████████████████████████████████████████████████████████████████▍ | 257.4/263.25 [00:17<00:00, 26.22seconds/s]
100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:17<00:00, 26.22seconds/s]
100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:17<00:00, 15.18seconds/s]
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /src/separated/htdemucs
Separating track youtubeaudio/andrew_huberman.wav
Version Details
Version ID
c445e27ff34574e92781c15c67db41835cedcdc27a19f527a7dcf37bd0ffe1ff
Version Created
November 20, 2023
Run on Replicate →