ttsds/fishspeech_1_4 📝🖼️ → 🖼️

▶️ 219 runs 📅 Jan 2025 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License
speech-style-transfer text-to-speech voice-cloning

About

The Fish Speech V1.4 model.

Example Output

Output

Example output

Performance Metrics

4.11s Prediction Time
78.19s Total Time
All Input Parameters
{
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "text_reference": "and keeping eternity before the eyes, though much",
  "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Input Parameters
text (required) Type: string
text_reference (required) Type: string
speaker_reference (required) Type: string
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
2025-01-28 18:47:27.414 | INFO     | tools.llama.generate:generate_long:759 - Encoded text: With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.
2025-01-28 18:47:27.415 | INFO     | tools.llama.generate:generate_long:777 - Generating sentence 1/1 of sample 1/1
  0%|          | 0/3965 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
warnings.warn(
  0%|          | 4/3965 [00:00<01:44, 37.72it/s]
  0%|          | 8/3965 [00:00<01:44, 38.01it/s]
  0%|          | 12/3965 [00:00<01:43, 38.31it/s]
  0%|          | 16/3965 [00:00<01:46, 37.10it/s]
  1%|          | 20/3965 [00:00<01:49, 35.87it/s]
  1%|          | 24/3965 [00:00<01:47, 36.77it/s]
  1%|          | 28/3965 [00:00<01:45, 37.39it/s]
  1%|          | 32/3965 [00:00<01:43, 37.86it/s]
  1%|          | 36/3965 [00:00<01:43, 38.08it/s]
  1%|          | 40/3965 [00:01<01:42, 38.33it/s]
  1%|          | 44/3965 [00:01<01:42, 38.35it/s]
  1%|          | 48/3965 [00:01<01:41, 38.49it/s]
  1%|▏         | 52/3965 [00:01<01:41, 38.62it/s]
  1%|▏         | 56/3965 [00:01<01:40, 38.71it/s]
  2%|▏         | 60/3965 [00:01<01:40, 38.74it/s]
  2%|▏         | 64/3965 [00:01<01:40, 38.75it/s]
  2%|▏         | 68/3965 [00:01<01:40, 38.76it/s]
  2%|▏         | 72/3965 [00:01<01:40, 38.74it/s]
  2%|▏         | 76/3965 [00:01<01:40, 38.72it/s]
  2%|▏         | 80/3965 [00:02<01:41, 38.29it/s]
  2%|▏         | 84/3965 [00:02<01:41, 38.16it/s]
  2%|▏         | 88/3965 [00:02<01:41, 38.27it/s]
  2%|▏         | 92/3965 [00:02<01:40, 38.45it/s]
  2%|▏         | 96/3965 [00:02<01:41, 38.12it/s]
  3%|▎         | 100/3965 [00:02<01:40, 38.28it/s]
  3%|▎         | 104/3965 [00:02<01:40, 38.39it/s]
  3%|▎         | 108/3965 [00:02<01:40, 38.53it/s]
  3%|▎         | 112/3965 [00:02<01:39, 38.65it/s]
  3%|▎         | 116/3965 [00:03<01:39, 38.70it/s]
  3%|▎         | 120/3965 [00:03<01:39, 38.66it/s]
3%|▎         | 122/3965 [00:03<01:41, 37.95it/s]
2025-01-28 18:47:30.827 | INFO     | tools.llama.generate:generate_long:832 - Generated 124 tokens in 3.41 seconds, 36.34 tokens/sec
2025-01-28 18:47:30.827 | INFO     | tools.llama.generate:generate_long:835 - Bandwidth achieved: 17.97 GB/s
2025-01-28 18:47:30.827 | INFO     | tools.llama.generate:generate_long:840 - GPU Memory used: 1.63 GB
Next sample
Version Details
Version ID
7d55af8314c9ec4206d76c1e958cd8807c9c1bd59bffcfec363aea89e7179dd8
Version Created
January 28, 2025
Run on Replicate →