jichengdu/fish-speech 📝🖼️ → 🖼️

▶️ 1.0K runs 📅 Jan 2025 ⚙️ Cog 0.14.2 🔗 GitHub ⚖️ License

text-to-speech voice-cloning

About

Fish Speech V1.5-SOTA Open Source TTS

Example Output

Output

Performance Metrics

2.74s Prediction Time

117.84s Total Time

All Input Parameters

{
  "text": "我的猫，就是全世界最好的猫！",
  "text_reference": "希望你以后能够做得比我还好哟！",
  "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav"
}

Input Parameters

text (required) Type: string: 要转换成语音的文本 (Text to convert to speech)
text_reference (required) Type: string: 参考音频对应的文本内容 (Text content corresponding to the reference audio)
speaker_reference (required) Type: string: 参考音频文件 (Reference audio file)

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

2025-03-21 07:13:58.443 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: 我的猫，就是全世界最好的猫！
2025-03-21 07:13:58.443 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
  0%|          | 0/8070 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
warnings.warn(
  0%|          | 4/8070 [00:00<03:49, 35.18it/s]
  0%|          | 8/8070 [00:00<03:47, 35.46it/s]
  0%|          | 12/8070 [00:00<03:46, 35.55it/s]
  0%|          | 16/8070 [00:00<03:46, 35.59it/s]
  0%|          | 20/8070 [00:00<03:46, 35.52it/s]
  0%|          | 24/8070 [00:00<03:47, 35.30it/s]
  0%|          | 28/8070 [00:00<03:48, 35.14it/s]
  0%|          | 32/8070 [00:00<03:48, 35.21it/s]
  0%|          | 36/8070 [00:01<03:47, 35.30it/s]
  0%|          | 40/8070 [00:01<03:46, 35.42it/s]
  1%|          | 44/8070 [00:01<03:45, 35.52it/s]
  1%|          | 48/8070 [00:01<03:45, 35.59it/s]
  1%|          | 52/8070 [00:01<03:44, 35.64it/s]
  1%|          | 56/8070 [00:01<03:44, 35.67it/s]
1%|          | 56/8070 [00:01<03:49, 34.85it/s]
2025-03-21 07:14:00.300 | INFO     | tools.llama.generate:generate_long:861 - Generated 58 tokens in 1.86 seconds, 31.24 tokens/sec
2025-03-21 07:14:00.301 | INFO     | tools.llama.generate:generate_long:864 - Bandwidth achieved: 19.93 GB/s
2025-03-21 07:14:00.301 | INFO     | tools.llama.generate:generate_long:869 - GPU Memory used: 2.03 GB
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv1d(input, weight, bias, self.stride,
Next sample

Version Details

Version ID: 11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc
Version Created: March 21, 2025

Run on Replicate →