zsxkib/hololive-style-bert-vits2 ❓✓🔢📝🖼️ → 🖼️

▶️ 886 runs 📅 Jun 2024 ⚙️ Cog 0.9.8 🔗 GitHub ⚖️ License
audio-to-audio speech-style-transfer text-to-speech

About

🎙️Hololive text-to-speech and voice-to-voice (Japanese🇯🇵 + English🇬🇧)

Example Output

Output

Example output

Performance Metrics

4.81s Prediction Time
275.19s Total Time
All Input Parameters
{
  "style": "Neutral",
  "speaker": "EN_MoriCalliope",
  "use_tone": false,
  "sdp_ratio": 0.2,
  "line_split": true,
  "style_text": "",
  "text_input": "Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!",
  "noise_scale": 0.6,
  "length_scale": 1,
  "style_weight": 5,
  "noise_scale_w": 0.8,
  "split_interval": 0.5,
  "use_style_text": false,
  "style_text_weight": 0.7
}
Input Parameters
style Default: Neutral
Style of speech to use (choices may be limited based on the selected speaker)
speaker Default: EN_MoriCalliope
Default speaker
use_tone Type: booleanDefault: false
Whether to use tone information in the synthesis (Japanese only)
sdp_ratio Type: numberDefault: 0.2
Ratio for speaker-dependent processing
line_split Type: booleanDefault: true
Whether to split the text into lines for processing
style_text Type: stringDefault:
Additional text to guide the style of the synthesis
text_input Type: stringDefault: Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!
Text to convert to speech (text-to-voice)
noise_scale Type: numberDefault: 0.6
Scale of noise to add to the synthesis
length_scale Type: numberDefault: 1
Scale of the length of the synthesized speech
style_weight Type: numberDefault: 5
Weight of the style effect
noise_scale_w Type: numberDefault: 0.8
Scale of noise for the waveform
split_interval Type: numberDefault: 0.5
Interval between splits when line_split is True
use_style_text Type: booleanDefault: false
Whether to use additional style text in the synthesis
style_text_weight Type: numberDefault: 0.7
Weight of the style text effect
reference_audio_path Type: string
Path to a reference audio file (voice-to-voice)
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
[!] model_name: SBV2_HoloLow
[!] model_path: model_assets/SBV2_HoloLow/SBV2_HoloLow.safetensors
[!] text: Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!
[!] language: EN
[!] reference_audio_path: None
[!] sdp_ratio: 0.2
[!] noise_scale: 0.6
[!] noise_scale_w: 0.8
[!] length_scale: 1.0
[!] line_split: True
[!] split_interval: 0.5
[!] assist_text:
[!] assist_text_weight: 0.7
[!] use_assist_text: False
[!] style: Neutral
[!] style_weight: 5.0
[!] kata_tone_json_str:
[!] use_tone: False
[!] speaker: MoriCalliope
[!] Swapped to model 'SBV2_HoloLow'
/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv1d(input, weight, bias, self.stride,
[!] Successful inference, took 3.519026s | MoriCalliope | EN/0.2/0.6/0.8/1.0/Neutral/5.0 | Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!
Version Details
Version ID
595ac4205eb84ba9330f178f2f2e4460f9ad9b67bcc8e744a7d8339f01ff24d4
Version Created
June 3, 2024
Run on Replicate →