thomcle/chatterbox-tts 📝🔢🖼️ → 🖼️

▶️ 991 runs 📅 May 2025 ⚙️ Cog 0.15.1 🔗 GitHub ⚖️ License

text-to-speech voice-cloning

About

Chatterbox is a state-of-the-art zeroshot TTS

Example Output

Output

Performance Metrics

7.23s Prediction Time

7.25s Total Time

All Input Parameters

{
  "text": "Then I would never talk to that person about boa constrictors, or primeval forests, or stars. I would bring myself down to his level.",
  "cfg_weight": 0.5,
  "temperature": 0.8,
  "exaggeration": 0.5,
  "audio_prompt_path": "https://maskgct.github.io/audios/celeb_samples/rick_0.wav"
}

Input Parameters

text Type: stringDefault: Wow! That was an incredible firework display.: Text to synthesize
cfg_weight Type: numberDefault: 0.5: Balances text fidelity and creativity; higher values make speech closer to the input text.
temperature Type: numberDefault: 0.8: Adjusts randomness in speech generation; higher values produce more varied and natural output.
exaggeration Type: numberDefault: 0.5: Controls how expressive or exaggerated the speech sounds; higher values increase emotional intensity.
audio_prompt_path Type: string: Reference audio file to clone

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

WARNING:root:Reference mel length is not equal to 2 * reference token length.
Sampling:   0%|          | 0/1000 [00:00<?, ?it/s]
Sampling:   0%|          | 2/1000 [00:00<00:58, 17.12it/s]
Sampling:   1%|          | 6/1000 [00:00<00:39, 25.40it/s]
Sampling:   1%|          | 10/1000 [00:00<00:34, 28.29it/s]
Sampling:   1%|▏         | 14/1000 [00:00<00:33, 29.14it/s]
Sampling:   2%|▏         | 18/1000 [00:00<00:33, 29.71it/s]
Sampling:   2%|▏         | 22/1000 [00:00<00:31, 30.60it/s]
Sampling:   3%|▎         | 26/1000 [00:00<00:31, 31.15it/s]
Sampling:   3%|▎         | 30/1000 [00:01<00:31, 31.16it/s]
Sampling:   3%|▎         | 34/1000 [00:01<00:31, 31.09it/s]
Sampling:   4%|▍         | 38/1000 [00:01<00:30, 31.05it/s]
Sampling:   4%|▍         | 42/1000 [00:01<00:30, 31.60it/s]
Sampling:   5%|▍         | 46/1000 [00:01<00:30, 31.25it/s]
Sampling:   5%|▌         | 50/1000 [00:01<00:30, 31.61it/s]
Sampling:   5%|▌         | 54/1000 [00:01<00:29, 31.83it/s]
Sampling:   6%|▌         | 58/1000 [00:01<00:29, 32.00it/s]
Sampling:   6%|▌         | 62/1000 [00:02<00:29, 31.98it/s]
Sampling:   7%|▋         | 66/1000 [00:02<00:29, 32.10it/s]
Sampling:   7%|▋         | 70/1000 [00:02<00:29, 32.00it/s]
Sampling:   7%|▋         | 74/1000 [00:02<00:28, 32.18it/s]
Sampling:   8%|▊         | 78/1000 [00:02<00:29, 31.66it/s]
Sampling:   8%|▊         | 82/1000 [00:02<00:28, 31.85it/s]
Sampling:   9%|▊         | 86/1000 [00:02<00:28, 32.25it/s]
Sampling:   9%|▉         | 90/1000 [00:02<00:28, 32.31it/s]
Sampling:   9%|▉         | 94/1000 [00:03<00:28, 32.06it/s]
Sampling:  10%|▉         | 98/1000 [00:03<00:27, 32.33it/s]
Sampling:  10%|█         | 102/1000 [00:03<00:27, 32.26it/s]
Sampling:  11%|█         | 106/1000 [00:03<00:27, 32.38it/s]
Sampling:  11%|█         | 110/1000 [00:03<00:27, 32.14it/s]
Sampling:  11%|█▏        | 114/1000 [00:03<00:27, 32.41it/s]
Sampling:  12%|█▏        | 118/1000 [00:03<00:27, 32.64it/s]
Sampling:  12%|█▏        | 122/1000 [00:03<00:27, 32.15it/s]
Sampling:  13%|█▎        | 126/1000 [00:04<00:27, 32.24it/s]
Sampling:  13%|█▎        | 130/1000 [00:04<00:26, 32.24it/s]
Sampling:  13%|█▎        | 134/1000 [00:04<00:27, 31.22it/s]
Sampling:  14%|█▍        | 138/1000 [00:04<00:27, 31.70it/s]
Sampling:  14%|█▍        | 142/1000 [00:04<00:26, 31.80it/s]
Sampling:  15%|█▍        | 146/1000 [00:04<00:26, 32.17it/s]
Sampling:  15%|█▌        | 150/1000 [00:04<00:26, 31.71it/s]
Sampling:  15%|█▌        | 154/1000 [00:04<00:26, 31.60it/s]
Sampling:  16%|█▌        | 158/1000 [00:05<00:26, 31.53it/s]
Sampling:  16%|█▌        | 162/1000 [00:05<00:26, 31.55it/s]
Sampling:  17%|█▋        | 166/1000 [00:05<00:26, 31.52it/s]
Sampling:  17%|█▋        | 170/1000 [00:05<00:26, 31.89it/s]
Sampling:  17%|█▋        | 174/1000 [00:05<00:25, 31.97it/s]
Sampling:  18%|█▊        | 178/1000 [00:05<00:25, 32.06it/s]
Sampling:  18%|█▊        | 180/1000 [00:05<00:25, 31.54it/s]

Version Details

Version ID: 3f5f9c195086737dda710bf504330f71e786d0a361b505e377c8b10122af9d32
Version Created: May 30, 2025

Run on Replicate →