ttsds/fishspeech_1_2_sft 📝🖼️ → 🖼️
About
The Fish Speech V1.2 SFT model.
Example Output
Output
Performance Metrics
5.09s
Prediction Time
78.24s
Total Time
All Input Parameters
{ "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav" }
Input Parameters
- text (required)
- text_reference (required)
- speaker_reference (required)
Output Schema
Output
Example Execution Logs
2025-01-28 18:45:39.050 | INFO | tools.llama.generate:generate_long:432 - Encoded text: With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good. 2025-01-28 18:45:39.050 | INFO | tools.llama.generate:generate_long:450 - Generating sentence 1/1 of sample 1/1 0%| | 0/3892 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 0%| | 6/3892 [00:00<01:10, 54.92it/s] 0%| | 12/3892 [00:00<01:08, 56.30it/s] 0%| | 18/3892 [00:00<01:08, 56.74it/s] 1%| | 24/3892 [00:00<01:07, 56.99it/s] 1%| | 30/3892 [00:00<01:07, 57.13it/s] 1%| | 36/3892 [00:00<01:07, 57.23it/s] 1%| | 42/3892 [00:00<01:07, 57.29it/s] 1%| | 48/3892 [00:00<01:07, 57.26it/s] 1%|▏ | 54/3892 [00:00<01:07, 57.27it/s] 2%|▏ | 60/3892 [00:01<01:06, 57.21it/s] 2%|▏ | 66/3892 [00:01<01:06, 57.23it/s] 2%|▏ | 72/3892 [00:01<01:06, 57.22it/s] 2%|▏ | 78/3892 [00:01<01:06, 57.22it/s] 2%|▏ | 84/3892 [00:01<01:07, 56.64it/s] 2%|▏ | 90/3892 [00:01<01:07, 56.65it/s] 2%|▏ | 96/3892 [00:01<01:06, 56.84it/s] 3%|▎ | 102/3892 [00:01<01:07, 56.24it/s] 3%|▎ | 108/3892 [00:01<01:06, 56.51it/s] 3%|▎ | 114/3892 [00:02<01:06, 56.73it/s] 3%|▎ | 120/3892 [00:02<01:06, 56.94it/s] 3%|▎ | 126/3892 [00:02<01:06, 57.04it/s] 3%|▎ | 132/3892 [00:02<01:05, 57.14it/s] 4%|▎ | 138/3892 [00:02<01:06, 56.88it/s] 4%|▎ | 144/3892 [00:02<01:05, 56.98it/s] 4%|▍ | 150/3892 [00:02<01:05, 57.13it/s] 4%|▍ | 156/3892 [00:02<01:05, 57.09it/s] 4%|▍ | 162/3892 [00:02<01:05, 57.00it/s] 4%|▍ | 168/3892 [00:02<01:05, 57.04it/s] 4%|▍ | 174/3892 [00:03<01:05, 57.15it/s] 5%|▍ | 180/3892 [00:03<01:04, 57.20it/s] 5%|▍ | 186/3892 [00:03<01:04, 57.18it/s] 5%|▍ | 192/3892 [00:03<01:04, 57.21it/s] 5%|▌ | 198/3892 [00:03<01:04, 57.24it/s] 5%|▌ | 204/3892 [00:03<01:04, 57.32it/s] 5%|▌ | 210/3892 [00:03<01:04, 57.38it/s] 6%|▌ | 216/3892 [00:03<01:04, 57.34it/s] 6%|▌ | 222/3892 [00:03<01:04, 57.27it/s] 6%|▌ | 228/3892 [00:03<01:03, 57.36it/s] 6%|▌ | 234/3892 [00:04<01:04, 56.65it/s] 6%|▌ | 240/3892 [00:04<01:04, 56.89it/s] 6%|▌ | 243/3892 [00:04<01:04, 56.78it/s] 2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:505 - Generated 245 tokens in 4.45 seconds, 55.07 tokens/sec 2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:508 - Bandwidth achieved: 27.00 GB/s 2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:513 - GPU Memory used: 1.56 GB /root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.) return F.conv1d(input, weight, bias, self.stride, Next sample
Version Details
- Version ID
34a7e498d81e49e7200ee9aaa52f18b2529f30162ba6afd163b43037aa5a5d20
- Version Created
- January 28, 2025