suminhthanh/vixtts 📝🖼️❓✓ → ❓

▶️ 532 runs 📅 Apr 2024 ⚙️ Cog 0.9.4 🔗 GitHub ⚖️ License

text-to-speech voice-cloning

Performance

18.0sTypical run time

~346sCold start (first call)

532Total runs

About

viⓍTTS vixTTS là mô hình tạo sinh giọng nói cho phép bạn sao chép giọng nói sang các ngôn ngữ khác nhau chỉ bằng cách sử dụng một đoạn âm thanh nhanh dài 6 giây

Example Output

Output

{"path":"https://replicate.delivery/pbxt/aMRh9enS4SSEMadjbWCfA509aJKPIXlYi8oqgRRwPeVtr9slA/0521061310_hanh_phuc_luon_la_niem_khao_khat_lon_nhat_cua_con__tmp.wav"}

Performance Metrics

17.98s Prediction Time

345.77s Total Time

All Input Parameters

{
  "text": "Hạnh phúc luôn là niềm khao khát lớn nhất của con người. Tùy vào hiểu biết của mỗi người qua từng xã hội và từng thời đại, mà hạnh phúc được quan niệm một cách khác nhau. Những người cứ gặp phải xui rủi triền miên, nên họ quả quyết rằng trên đời này làm gì có hạnh phúc. Còn những người trẻ thì cứ mơ mộng hạnh phúc chắc hẳn rất tuyệt diệu và tin rằng nó chỉ nằm ở cuối con đường mình đang đi. Và hằng bao lớp người đã đi gần hết kiếp nhân sinh mà vẫn đuổi theo hạnh phúc như trò chơi cút bắt: có khi tóm được nó thì nó lại tan biến, có khi ngỡ mình tay trắng thì lại thấy nó chợt hiện về. Mặc dù ai cũng mong muốn có hạnh phúc, nhưng khi được hỏi hạnh phúc là gì thì phần lớn mọi người đều rất lúng túng. Họ định nghĩa một cách rất mơ hồ, hoặc chỉ mỉm cười trong mặc cảm.",
  "speaker": "https://replicate.delivery/pbxt/KibHoI1aA7kYweYgeSV2fFOY67QwEuZNe5l1tFX7Z6FkaEoi/samples_nu-luu-loat.wav",
  "language": "vi",
  "cleanup_voice": true,
  "normalize_text": true,
  "use_deepfilter": true
}

Input Parameters

text Type: stringDefault: Xin chào các bạn: Text to synthesize
speaker (required) Type: string: Original speaker audio (wav, mp3, m4a, ogg, or flv). Duration should be at least 6 seconds.
language Default: vi: Output language for the synthesised speech
bucket_name Type: string: AWS S3 Bucket Name
cleanup_voice Type: booleanDefault: true: Whether to apply denoising to the speaker audio (microphone recordings)
normalize_text Type: booleanDefault: true: Whether to normalize the text
use_deepfilter Type: booleanDefault: true: Whether to use deepfilter
cdn_download_url Type: string: CDN Download URL
aws_access_key_id Type: string: AWS ACCESS KEY ID
aws_secret_access_key Type: string: AWS SECRET ACCESS KEY

Output Schema

path: Path

Example Execution Logs

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '/tmp/tmpw92d613fsamples_nu-luu-loat.wav':
Metadata:
encoder         : Lavf58.76.100
Duration: 00:00:14.81, bitrate: 384 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, mono, s16, 384 kb/s
Output /tmp/tmpw92d613fsamples_nu-luu-loat.wav same as Input #0 - exiting
FFmpeg cannot edit existing files in-place.
Invalidating cache /tmp/tmp7vy7zbnisamples_nu-luu-loat.wav
Running filter...
/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/df/io.py:9: UserWarning: `torchaudio.backend.common.AudioMetaData` has been moved to `torchaudio.AudioMetaData`. Please update the import path.
from torchaudio.backend.common import AudioMetaData
2024-05-21 06:12:56 | INFO     | DF | Running on torch 2.3.0+cu121
2024-05-21 06:12:56 | INFO     | DF | Running on host model-vp-29b957e23d9609f73f02c51c6cb66207-7b7bfb9494-k5ckz
2024-05-21 06:12:56 | INFO     | DF | Git commit: aabaa89, branch: master
2024-05-21 06:12:56 | INFO     | DF | Loading model settings of DeepFilterNet3
2024-05-21 06:12:56 | INFO     | DF | Using DeepFilterNet3 model at /root/.cache/DeepFilterNet/DeepFilterNet3
2024-05-21 06:12:56 | INFO     | DF | Initializing model `deepfilternet3`
2024-05-21 06:12:56 | INFO     | DF | Found checkpoint /root/.cache/DeepFilterNet/DeepFilterNet3/checkpoints/model_120.ckpt.best with epoch 120
2024-05-21 06:12:56 | INFO     | DF | Running on device cuda:0
2024-05-21 06:12:56 | INFO     | DF | Model loaded
2024-05-21 06:12:57 | WARNING  | DF | Audio sampling rate does not match model sampling rate (24000, 48000). Resampling...
/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/df/io.py:106: UserWarning: "sinc_interpolation" resampling method name is being deprecated and replaced by "sinc_interp_hann" in the next release. The default behavior remains unchanged.
return ta_resample(audio, orig_sr, new_sr, **params)
/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv2d(input, weight, bias, self.stride,
2024-05-21 06:12:58 | INFO     | DF | Enhanced noisy audio file 'tmpw92d613fsamples_nu-luu-loat.wav' in 0.46s (RT factor: 0.031)
/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/df/io.py:106: UserWarning: "sinc_interpolation" resampling method name is being deprecated and replaced by "sinc_interp_hann" in the next release. The default behavior remains unchanged.
return ta_resample(audio, orig_sr, new_sr, **params)
Computing conditioning latents...
['Hạnh phúc luôn là niềm khao khát lớn nhất của con người.',
'Tùy vào hiểu biết của mỗi người qua từng xã hội và từng thời đại, mà hạnh '
'phúc được quan niệm một cách khác nhau.',
'Những người cứ gặp phải xui rủi triền miên, nên họ quả quyết rằng trên đời '
'này làm gì có hạnh phúc.',
'Còn những người trẻ thì cứ mơ mộng hạnh phúc chắc hẳn rất tuyệt diệu và tin '
'rằng nó chỉ nằm ở cuối con đường mình đang đi.',
'Và hằng bao lớp người đã đi gần hết kiếp nhân sinh mà vẫn đuổi theo hạnh '
'phúc như trò chơi cút bắt: có khi tóm được nó thì nó lại tan biến, có khi '
'ngỡ mình tay trắng thì lại thấy nó chợt hiện về.',
'Mặc dù ai cũng mong muốn có hạnh phúc, nhưng khi được hỏi hạnh phúc là gì '
'thì phần lớn mọi người đều rất lúng túng.',
'Họ định nghĩa một cách rất mơ hồ, hoặc chỉ mỉm cười trong mặc cảm.']
Saving output to  /src/output/0521061310_hanh_phuc_luon_la_niem_khao_khat_lon_nhat_cua_con__tmp.wav
Duration to run_tts: 17469.36 ms

Version Details

Version ID: 5222190b47dfb128cd588f07dadb78107aa489bdcd0af45814d7841d47f608c6
Version Created: May 27, 2024

Run on Replicate →