ttsds/fishspeech_1_5 📝🖼️ → 🖼️

▶️ 583 runs 📅 Jan 2025 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License
text-to-speech voice-cloning

About

The Fish Speech V1.5 model.

Example Output

Output

Example output

Performance Metrics

5.18s Prediction Time
110.80s Total Time
All Input Parameters
{
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "text_reference": "and keeping eternity before the eyes, though much",
  "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Input Parameters
text (required) Type: string
text_reference (required) Type: string
speaker_reference (required) Type: string
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
2025-01-28 18:11:53.629 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.
2025-01-28 18:11:53.630 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
  0%|          | 0/8055 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
warnings.warn(
  0%|          | 4/8055 [00:00<03:46, 35.48it/s]
  0%|          | 8/8055 [00:00<03:41, 36.25it/s]
  0%|          | 12/8055 [00:00<03:40, 36.52it/s]
  0%|          | 16/8055 [00:00<03:39, 36.67it/s]
  0%|          | 20/8055 [00:00<03:38, 36.78it/s]
  0%|          | 24/8055 [00:00<03:38, 36.70it/s]
  0%|          | 28/8055 [00:00<03:37, 36.85it/s]
  0%|          | 32/8055 [00:00<03:37, 36.81it/s]
  0%|          | 36/8055 [00:00<03:37, 36.84it/s]
  0%|          | 40/8055 [00:01<03:37, 36.88it/s]
  1%|          | 44/8055 [00:01<03:41, 36.10it/s]
  1%|          | 48/8055 [00:01<03:49, 34.91it/s]
  1%|          | 52/8055 [00:01<03:47, 35.15it/s]
  1%|          | 56/8055 [00:01<03:46, 35.34it/s]
  1%|          | 60/8055 [00:01<03:48, 34.99it/s]
  1%|          | 64/8055 [00:01<03:58, 33.51it/s]
  1%|          | 68/8055 [00:01<03:59, 33.34it/s]
  1%|          | 72/8055 [00:02<03:52, 34.33it/s]
  1%|          | 76/8055 [00:02<03:47, 35.08it/s]
  1%|          | 80/8055 [00:02<03:43, 35.62it/s]
  1%|          | 84/8055 [00:02<03:41, 35.98it/s]
  1%|          | 88/8055 [00:02<03:40, 36.07it/s]
  1%|          | 92/8055 [00:02<03:40, 36.06it/s]
  1%|          | 96/8055 [00:02<03:38, 36.34it/s]
  1%|          | 100/8055 [00:02<03:38, 36.43it/s]
  1%|▏         | 104/8055 [00:02<03:37, 36.54it/s]
  1%|▏         | 108/8055 [00:03<03:37, 36.47it/s]
  1%|▏         | 112/8055 [00:03<03:38, 36.34it/s]
  1%|▏         | 116/8055 [00:03<03:37, 36.50it/s]
  1%|▏         | 120/8055 [00:03<03:36, 36.69it/s]
  2%|▏         | 124/8055 [00:03<03:36, 36.70it/s]
  2%|▏         | 128/8055 [00:03<03:36, 36.64it/s]
  2%|▏         | 132/8055 [00:03<03:36, 36.54it/s]
  2%|▏         | 136/8055 [00:03<03:35, 36.69it/s]
  2%|▏         | 140/8055 [00:03<03:35, 36.78it/s]
  2%|▏         | 144/8055 [00:03<03:34, 36.88it/s]
2%|▏         | 147/8055 [00:04<03:40, 35.85it/s]
2025-01-28 18:11:57.940 | INFO     | tools.llama.generate:generate_long:861 - Generated 149 tokens in 4.31 seconds, 34.57 tokens/sec
2025-01-28 18:11:57.940 | INFO     | tools.llama.generate:generate_long:864 - Bandwidth achieved: 22.06 GB/s
2025-01-28 18:11:57.940 | INFO     | tools.llama.generate:generate_long:869 - GPU Memory used: 2.02 GB
Next sample
Version Details
Version ID
f81057e21ad025b00703b8a2f63283d108829b7512f85c4c723c3edcc125f1bc
Version Created
January 28, 2025
Run on Replicate →