ttsds/fishspeech_1_1 📝🖼️ → 🖼️

▶️ 192 runs 📅 Jan 2025 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License
text-to-speech voice-cloning

About

The Fish Speech V1.1 model.

Example Output

Output

Example output

Performance Metrics

3.10s Prediction Time
83.63s Total Time
All Input Parameters
{
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "text_reference": "and keeping eternity before the eyes, though much",
  "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Input Parameters
text (required) Type: string
text_reference (required) Type: string
speaker_reference (required) Type: string
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
2025-01-28 16:26:02.222 | INFO     | tools.llama.generate:generate_long:491 - Encoded text: With tenure, Suzie'd have all
2025-01-28 16:26:02.222 | INFO     | tools.llama.generate:generate_long:491 - Encoded text: the more leisure for yachting,
2025-01-28 16:26:02.222 | INFO     | tools.llama.generate:generate_long:491 - Encoded text: but her publications are no
2025-01-28 16:26:02.222 | INFO     | tools.llama.generate:generate_long:491 - Encoded text: good.
2025-01-28 16:26:02.223 | INFO     | tools.llama.generate:generate_long:509 - Generating sentence 1/4 of sample 1/1
  0%|          | 0/1858 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
warnings.warn(
  0%|          | 6/1858 [00:00<00:31, 59.00it/s]
  1%|          | 12/1858 [00:00<00:32, 57.47it/s]
  1%|          | 19/1858 [00:00<00:30, 60.07it/s]
  1%|▏         | 26/1858 [00:00<00:29, 61.43it/s]
  2%|▏         | 33/1858 [00:00<00:29, 62.05it/s]
  2%|▏         | 40/1858 [00:00<00:29, 62.57it/s]
  3%|▎         | 47/1858 [00:00<00:29, 62.38it/s]
3%|▎         | 49/1858 [00:00<00:29, 60.40it/s]
2025-01-28 16:26:03.118 | INFO     | tools.llama.generate:generate_long:565 - Generated 51 tokens in 0.90 seconds, 56.95 tokens/sec
2025-01-28 16:26:03.118 | INFO     | tools.llama.generate:generate_long:568 - Bandwidth achieved: 22.22 GB/s
2025-01-28 16:26:03.118 | INFO     | tools.llama.generate:generate_long:573 - GPU Memory used: 2.45 GB
2025-01-28 16:26:03.119 | INFO     | tools.llama.generate:generate_long:509 - Generating sentence 2/4 of sample 1/1
  0%|          | 0/1759 [00:00<?, ?it/s]
  0%|          | 7/1759 [00:00<00:27, 62.93it/s]
  1%|          | 14/1759 [00:00<00:27, 62.96it/s]
  1%|          | 21/1759 [00:00<00:27, 62.82it/s]
  2%|▏         | 28/1759 [00:00<00:28, 61.75it/s]
  2%|▏         | 35/1759 [00:00<00:27, 62.10it/s]
2%|▏         | 40/1759 [00:00<00:28, 60.85it/s]
2025-01-28 16:26:03.792 | INFO     | tools.llama.generate:generate_long:565 - Generated 42 tokens in 0.67 seconds, 62.37 tokens/sec
2025-01-28 16:26:03.793 | INFO     | tools.llama.generate:generate_long:568 - Bandwidth achieved: 24.34 GB/s
2025-01-28 16:26:03.793 | INFO     | tools.llama.generate:generate_long:573 - GPU Memory used: 2.45 GB
2025-01-28 16:26:03.793 | INFO     | tools.llama.generate:generate_long:509 - Generating sentence 3/4 of sample 1/1
  0%|          | 0/1672 [00:00<?, ?it/s]
  0%|          | 7/1672 [00:00<00:26, 63.08it/s]
  1%|          | 14/1672 [00:00<00:26, 62.91it/s]
  1%|▏         | 21/1672 [00:00<00:26, 62.97it/s]
  2%|▏         | 28/1672 [00:00<00:26, 62.73it/s]
  2%|▏         | 35/1672 [00:00<00:26, 62.46it/s]
2%|▏         | 39/1672 [00:00<00:26, 61.00it/s]
2025-01-28 16:26:04.454 | INFO     | tools.llama.generate:generate_long:565 - Generated 41 tokens in 0.66 seconds, 62.06 tokens/sec
2025-01-28 16:26:04.454 | INFO     | tools.llama.generate:generate_long:568 - Bandwidth achieved: 24.21 GB/s
2025-01-28 16:26:04.454 | INFO     | tools.llama.generate:generate_long:573 - GPU Memory used: 2.45 GB
2025-01-28 16:26:04.454 | INFO     | tools.llama.generate:generate_long:509 - Generating sentence 4/4 of sample 1/1
  0%|          | 0/1608 [00:00<?, ?it/s]
  0%|          | 7/1608 [00:00<00:25, 63.11it/s]
  1%|          | 14/1608 [00:00<00:25, 63.00it/s]
1%|          | 14/1608 [00:00<00:27, 58.74it/s]
2025-01-28 16:26:04.709 | INFO     | tools.llama.generate:generate_long:565 - Generated 16 tokens in 0.25 seconds, 62.81 tokens/sec
2025-01-28 16:26:04.709 | INFO     | tools.llama.generate:generate_long:568 - Bandwidth achieved: 24.51 GB/s
2025-01-28 16:26:04.710 | INFO     | tools.llama.generate:generate_long:573 - GPU Memory used: 2.45 GB
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv1d(input, weight, bias, self.stride,
Next sample
Version Details
Version ID
278768f6f5c2cc5a51d11dcf9fea9307063f6e053f489a9ccf16c46623e2a001
Version Created
March 24, 2025
Run on Replicate →