datong-new/vc ๐Ÿ“๐Ÿ–ผ๏ธโ“ โ†’ โ“

โ–ถ๏ธ 74 runs ๐Ÿ“… Apr 2024 โš™๏ธ Cog 0.9.4
text-to-speech voice-cloning

Example Output

Output

{"audio":"https://replicate.delivery/pbxt/FF1JjIkpSSbuIZYEqapFWplwlSGppiVEsfdVmPmG5eelQwdlA/output.wav"}

Performance Metrics

13.70s Prediction Time
119.75s Total Time
All Input Parameters
{
  "text": "ๅ› ๆญค๏ผŒๆฒกๆœ‰ไธ€ไธชๅ›บๅฎš็š„็ญ”ๆกˆๆฅๆ่ฟฐๆ‰€ๆœ‰ๆƒ…ๅ†ตไธ‹็ผ–็ ๅŽๅญ—็ฌฆไธฒ็š„ๅ…ทไฝ“้•ฟๅบฆ๏ผŒๅฎƒไผšๆ นๆฎไฝ ้œ€่ฆ็ผ–็ ็š„ๅŽŸๅง‹ๆ•ฐๆฎ็š„ๅคงๅฐ่€Œๆœ‰ๆ‰€ไธๅŒใ€‚ๅฏนไบŽๅ…ทไฝ“ๆƒ…ๅ†ตไธ‹็š„็ฒพ็กฎ้•ฟๅบฆ๏ผŒ้œ€่ฆๆ นๆฎๅฎž้™…็š„ๅŽŸๅง‹ๆ•ฐๆฎๅ’Œ็ผ–็ ่ง„ๅˆ™ๆฅ็กฎๅฎšใ€‚",
  "refer_video": "https://replicate.delivery/pbxt/Kp95btr4Xmaj6eWm7zpQVyO5kAeEtfPvt5jL22TkYxq0H2Dr/myvoice.wav",
  "text_language": "ไธญๆ–‡",
  "video_language": "ไธญๆ–‡"
}
Input Parameters
text (required) Type: string
refer_video (required) Type: string
The path of the video to be referred.
text_language Default: ไธญๆ–‡
The language of the text to be referred.
video_language Default: ไธญๆ–‡
The language of the video to be referred.
Output Schema
audio Type: stringFormat: uri
Audio
Example Execution Logs
prompt_text ๅฏนไบŽๆˆ‘็š„็”ต่ฏ,ไป–ๅ›ž็š„่ถŠๆฅ่ถŠๅฐ‘,็œ‹ๅ‘็ก็ƒŸ็š„็œผ็ฅžๅด่ถŠๆฅ่ถŠๆธฉๆŸ”,ไป–ไปฌๅตไบ†ๅ››ๅนด็š„CP,ไปŽๆœชๆ‰ฟ่ฎค,ๅดไธ€็›ดๆšงๆ˜งใ€‚
INFO:     127.0.0.1:37136 - "GET /?refer_wav_path=%2Ftmp%2Ftmpun1yp9m7myvoice.wav&prompt_text=%E5%AF%B9%E4%BA%8E%E6%88%91%E7%9A%84%E7%94%B5%E8%AF%9D%2C%E4%BB%96%E5%9B%9E%E7%9A%84%E8%B6%8A%E6%9D%A5%E8%B6%8A%E5%B0%91%2C%E7%9C%8B%E5%90%91%E7%A1%9D%E7%83%9F%E7%9A%84%E7%9C%BC%E7%A5%9E%E5%8D%B4%E8%B6%8A%E6%9D%A5%E8%B6%8A%E6%B8%A9%E6%9F%94%2C%E4%BB%96%E4%BB%AC%E5%90%B5%E4%BA%86%E5%9B%9B%E5%B9%B4%E7%9A%84CP%2C%E4%BB%8E%E6%9C%AA%E6%89%BF%E8%AE%A4%2C%E5%8D%B4%E4%B8%80%E7%9B%B4%E6%9A%A7%E6%98%A7%E3%80%82&prompt_language=all_zh&text=%E5%9B%A0%E6%AD%A4%EF%BC%8C%E6%B2%A1%E6%9C%89%E4%B8%80%E4%B8%AA%E5%9B%BA%E5%AE%9A%E7%9A%84%E7%AD%94%E6%A1%88%E6%9D%A5%E6%8F%8F%E8%BF%B0%E6%89%80%E6%9C%89%E6%83%85%E5%86%B5%E4%B8%8B%E7%BC%96%E7%A0%81%E5%90%8E%E5%AD%97%E7%AC%A6%E4%B8%B2%E7%9A%84%E5%85%B7%E4%BD%93%E9%95%BF%E5%BA%A6%EF%BC%8C%E5%AE%83%E4%BC%9A%E6%A0%B9%E6%8D%AE%E4%BD%A0%E9%9C%80%E8%A6%81%E7%BC%96%E7%A0%81%E7%9A%84%E5%8E%9F%E5%A7%8B%E6%95%B0%E6%8D%AE%E7%9A%84%E5%A4%A7%E5%B0%8F%E8%80%8C%E6%9C%89%E6%89%80%E4%B8%8D%E5%90%8C%E3%80%82%E5%AF%B9%E4%BA%8E%E5%85%B7%E4%BD%93%E6%83%85%E5%86%B5%E4%B8%8B%E7%9A%84%E7%B2%BE%E7%A1%AE%E9%95%BF%E5%BA%A6%EF%BC%8C%E9%9C%80%E8%A6%81%E6%A0%B9%E6%8D%AE%E5%AE%9E%E9%99%85%E7%9A%84%E5%8E%9F%E5%A7%8B%E6%95%B0%E6%8D%AE%E5%92%8C%E7%BC%96%E7%A0%81%E8%A7%84%E5%88%99%E6%9D%A5%E7%A1%AE%E5%AE%9A%E3%80%82&text_language=all_zh HTTP/1.1" 200 OK
Building prefix dict from the default dictionary ...
DEBUG:jieba_fast:Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
DEBUG:jieba_fast:Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.631 seconds.
Prefix dict has been built succesfully.
DEBUG:jieba_fast:Loading model cost 0.631 seconds.
DEBUG:jieba_fast:Prefix dict has been built succesfully.
  0%|          | 0/1500 [00:00<?, ?it/s]
  0%|          | 4/1500 [00:00<00:38, 38.56it/s]
  1%|          | 14/1500 [00:00<00:20, 72.18it/s]
  2%|โ–         | 24/1500 [00:00<00:18, 81.69it/s]
  2%|โ–         | 34/1500 [00:00<00:17, 85.85it/s]
  3%|โ–Ž         | 44/1500 [00:00<00:16, 88.20it/s]
  4%|โ–Ž         | 54/1500 [00:00<00:16, 89.57it/s]
  4%|โ–         | 63/1500 [00:00<00:16, 89.29it/s]
  5%|โ–         | 73/1500 [00:00<00:15, 90.09it/s]
  6%|โ–Œ         | 83/1500 [00:00<00:15, 90.67it/s]
  6%|โ–Œ         | 93/1500 [00:01<00:15, 91.84it/s]
  7%|โ–‹         | 103/1500 [00:01<00:14, 93.37it/s]
  8%|โ–Š         | 113/1500 [00:01<00:14, 94.55it/s]
  8%|โ–Š         | 123/1500 [00:01<00:14, 95.18it/s]
  9%|โ–‰         | 133/1500 [00:01<00:14, 95.46it/s]
 10%|โ–‰         | 143/1500 [00:01<00:14, 94.43it/s]
 10%|โ–ˆ         | 153/1500 [00:01<00:14, 93.61it/s]
 11%|โ–ˆ         | 163/1500 [00:01<00:14, 93.12it/s]
 12%|โ–ˆโ–        | 173/1500 [00:01<00:14, 92.68it/s]
 12%|โ–ˆโ–        | 183/1500 [00:02<00:14, 92.50it/s]
 13%|โ–ˆโ–Ž        | 193/1500 [00:02<00:14, 92.44it/s]
 14%|โ–ˆโ–Ž        | 203/1500 [00:02<00:14, 92.55it/s]
 14%|โ–ˆโ–        | 213/1500 [00:02<00:13, 92.76it/s]
 15%|โ–ˆโ–        | 223/1500 [00:02<00:13, 94.07it/s]
 16%|โ–ˆโ–Œ        | 233/1500 [00:02<00:13, 94.97it/s]
 16%|โ–ˆโ–Œ        | 243/1500 [00:02<00:13, 95.47it/s]
 17%|โ–ˆโ–‹        | 253/1500 [00:02<00:13, 95.74it/s]
 18%|โ–ˆโ–Š        | 263/1500 [00:02<00:13, 94.63it/s]
 18%|โ–ˆโ–Š        | 273/1500 [00:02<00:13, 94.03it/s]
 19%|โ–ˆโ–‰        | 283/1500 [00:03<00:13, 93.50it/s]
 20%|โ–ˆโ–‰        | 293/1500 [00:03<00:12, 93.21it/s]
 20%|โ–ˆโ–ˆ        | 303/1500 [00:03<00:12, 92.97it/s]
 21%|โ–ˆโ–ˆ        | 313/1500 [00:03<00:12, 92.63it/s]
 22%|โ–ˆโ–ˆโ–       | 323/1500 [00:03<00:12, 92.16it/s]
 22%|โ–ˆโ–ˆโ–       | 333/1500 [00:03<00:12, 91.61it/s]
 23%|โ–ˆโ–ˆโ–Ž       | 343/1500 [00:03<00:12, 91.70it/s]
 24%|โ–ˆโ–ˆโ–Ž       | 353/1500 [00:03<00:12, 92.05it/s]
 24%|โ–ˆโ–ˆโ–       | 363/1500 [00:03<00:12, 92.17it/s]
 25%|โ–ˆโ–ˆโ–       | 373/1500 [00:04<00:12, 92.31it/s]
 26%|โ–ˆโ–ˆโ–Œ       | 383/1500 [00:04<00:12, 92.67it/s]
 26%|โ–ˆโ–ˆโ–Œ       | 393/1500 [00:04<00:11, 92.82it/s]
 27%|โ–ˆโ–ˆโ–‹       | 403/1500 [00:04<00:11, 93.01it/s]
 28%|โ–ˆโ–ˆโ–Š       | 413/1500 [00:04<00:11, 93.33it/s]
 28%|โ–ˆโ–ˆโ–Š       | 423/1500 [00:04<00:11, 92.49it/s]
 29%|โ–ˆโ–ˆโ–‰       | 433/1500 [00:04<00:11, 92.71it/s]
 30%|โ–ˆโ–ˆโ–‰       | 443/1500 [00:04<00:11, 92.72it/s]
 30%|โ–ˆโ–ˆโ–ˆ       | 453/1500 [00:04<00:11, 92.60it/s]
 31%|โ–ˆโ–ˆโ–ˆ       | 463/1500 [00:05<00:11, 94.02it/s]
 32%|โ–ˆโ–ˆโ–ˆโ–      | 473/1500 [00:05<00:10, 95.00it/s]
 32%|โ–ˆโ–ˆโ–ˆโ–      | 483/1500 [00:05<00:10, 95.02it/s]
 33%|โ–ˆโ–ˆโ–ˆโ–Ž      | 493/1500 [00:05<00:10, 94.37it/s]
 34%|โ–ˆโ–ˆโ–ˆโ–Ž      | 503/1500 [00:05<00:10, 93.12it/s]
 34%|โ–ˆโ–ˆโ–ˆโ–      | 513/1500 [00:05<00:10, 93.21it/s]
 35%|โ–ˆโ–ˆโ–ˆโ–      | 523/1500 [00:05<00:10, 93.09it/s]
 36%|โ–ˆโ–ˆโ–ˆโ–Œ      | 533/1500 [00:05<00:10, 93.11it/s]
 36%|โ–ˆโ–ˆโ–ˆโ–Œ      | 543/1500 [00:05<00:10, 93.12it/s]
 37%|โ–ˆโ–ˆโ–ˆโ–‹      | 553/1500 [00:05<00:10, 93.18it/s]
 38%|โ–ˆโ–ˆโ–ˆโ–Š      | 563/1500 [00:06<00:10, 93.17it/s]
 38%|โ–ˆโ–ˆโ–ˆโ–Š      | 573/1500 [00:06<00:09, 93.29it/s]
 39%|โ–ˆโ–ˆโ–ˆโ–‰      | 583/1500 [00:06<00:09, 94.03it/s]
 40%|โ–ˆโ–ˆโ–ˆโ–‰      | 593/1500 [00:06<00:09, 93.87it/s]
T2S Decoding EOS [356 -> 963]
 40%|โ–ˆโ–ˆโ–ˆโ–ˆ      | 603/1500 [00:06<00:09, 93.75it/s]
40%|โ–ˆโ–ˆโ–ˆโ–ˆ      | 606/1500 [00:06<00:09, 92.30it/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)
return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
Version Details
Version ID
c0edf71c5242cb90457701a2f44292bfc12c333aab31878ab57ee164bfc07259
Version Created
May 1, 2024
Run on Replicate โ†’