zsxkib/hololive-style-bert-vits2 ❓✓🔢📝🖼️ → 🖼️

▶️ 900 runs 📅 Jun 2024 ⚙️ Cog 0.9.8 🔗 GitHub ⚖️ License

audio-to-audio speech-style-transfer text-to-speech

Performance

4.8sTypical run time

~275sCold start (first call)

900Total runs

About

🎙️Hololive text-to-speech and voice-to-voice (Japanese🇯🇵 + English🇬🇧)

Example Output

Output

Performance Metrics

4.81s Prediction Time

275.19s Total Time

All Input Parameters

{
  "style": "Neutral",
  "speaker": "EN_MoriCalliope",
  "use_tone": false,
  "sdp_ratio": 0.2,
  "line_split": true,
  "style_text": "",
  "text_input": "Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!",
  "noise_scale": 0.6,
  "length_scale": 1,
  "style_weight": 5,
  "noise_scale_w": 0.8,
  "split_interval": 0.5,
  "use_style_text": false,
  "style_text_weight": 0.7
}

Input Parameters

style Default: Neutral: Style of speech to use (choices may be limited based on the selected speaker)
speaker Default: EN_MoriCalliope: Default speaker
use_tone Type: booleanDefault: false: Whether to use tone information in the synthesis (Japanese only)
sdp_ratio Type: numberDefault: 0.2: Ratio for speaker-dependent processing
line_split Type: booleanDefault: true: Whether to split the text into lines for processing
style_text Type: stringDefault:: Additional text to guide the style of the synthesis
text_input Type: stringDefault: Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!: Text to convert to speech (text-to-voice)
noise_scale Type: numberDefault: 0.6: Scale of noise to add to the synthesis
length_scale Type: numberDefault: 1: Scale of the length of the synthesized speech
style_weight Type: numberDefault: 5: Weight of the style effect
noise_scale_w Type: numberDefault: 0.8: Scale of noise for the waveform
split_interval Type: numberDefault: 0.5: Interval between splits when line_split is True
use_style_text Type: booleanDefault: false: Whether to use additional style text in the synthesis
style_text_weight Type: numberDefault: 0.7: Weight of the style text effect
reference_audio_path Type: string: Path to a reference audio file (voice-to-voice)

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

[!] model_name: SBV2_HoloLow
[!] model_path: model_assets/SBV2_HoloLow/SBV2_HoloLow.safetensors
[!] text: Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!
[!] language: EN
[!] reference_audio_path: None
[!] sdp_ratio: 0.2
[!] noise_scale: 0.6
[!] noise_scale_w: 0.8
[!] length_scale: 1.0
[!] line_split: True
[!] split_interval: 0.5
[!] assist_text:
[!] assist_text_weight: 0.7
[!] use_assist_text: False
[!] style: Neutral
[!] style_weight: 5.0
[!] kata_tone_json_str:
[!] use_tone: False
[!] speaker: MoriCalliope
[!] Swapped to model 'SBV2_HoloLow'
/root/.pyenv/versions/3.11.9/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv1d(input, weight, bias, self.stride,
[!] Successful inference, took 3.519026s | MoriCalliope | EN/0.2/0.6/0.8/1.0/Neutral/5.0 | Hello there! This is test audio of a new Hololive text to speech tool running on Replicate!

Version Details

Version ID: 595ac4205eb84ba9330f178f2f2e4460f9ad9b67bcc8e744a7d8339f01ff24d4
Version Created: June 3, 2024

Run on Replicate →