jichengdu/fish-speech 📝🖼️ → 🖼️
About
Fish Speech V1.5-SOTA Open Source TTS

Example Output
Output
Performance Metrics
2.74s
Prediction Time
117.84s
Total Time
All Input Parameters
{ "text": "我的猫,就是全世界最好的猫!", "text_reference": "希望你以后能够做得比我还好哟!", "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav" }
Input Parameters
- text (required)
- 要转换成语音的文本 (Text to convert to speech)
- text_reference (required)
- 参考音频对应的文本内容 (Text content corresponding to the reference audio)
- speaker_reference (required)
- 参考音频文件 (Reference audio file)
Output Schema
Output
Example Execution Logs
2025-03-21 07:13:58.443 | INFO | tools.llama.generate:generate_long:789 - Encoded text: 我的猫,就是全世界最好的猫! 2025-03-21 07:13:58.443 | INFO | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1 0%| | 0/8070 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 0%| | 4/8070 [00:00<03:49, 35.18it/s] 0%| | 8/8070 [00:00<03:47, 35.46it/s] 0%| | 12/8070 [00:00<03:46, 35.55it/s] 0%| | 16/8070 [00:00<03:46, 35.59it/s] 0%| | 20/8070 [00:00<03:46, 35.52it/s] 0%| | 24/8070 [00:00<03:47, 35.30it/s] 0%| | 28/8070 [00:00<03:48, 35.14it/s] 0%| | 32/8070 [00:00<03:48, 35.21it/s] 0%| | 36/8070 [00:01<03:47, 35.30it/s] 0%| | 40/8070 [00:01<03:46, 35.42it/s] 1%| | 44/8070 [00:01<03:45, 35.52it/s] 1%| | 48/8070 [00:01<03:45, 35.59it/s] 1%| | 52/8070 [00:01<03:44, 35.64it/s] 1%| | 56/8070 [00:01<03:44, 35.67it/s] 1%| | 56/8070 [00:01<03:49, 34.85it/s] 2025-03-21 07:14:00.300 | INFO | tools.llama.generate:generate_long:861 - Generated 58 tokens in 1.86 seconds, 31.24 tokens/sec 2025-03-21 07:14:00.301 | INFO | tools.llama.generate:generate_long:864 - Bandwidth achieved: 19.93 GB/s 2025-03-21 07:14:00.301 | INFO | tools.llama.generate:generate_long:869 - GPU Memory used: 2.03 GB /root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.) return F.conv1d(input, weight, bias, self.stride, Next sample
Version Details
- Version ID
11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc
- Version Created
- March 21, 2025