thomcle/chatterbox-tts 📝🔢🖼️ → 🖼️
About
Chatterbox is a state-of-the-art zeroshot TTS

Example Output
Output
Performance Metrics
7.23s
Prediction Time
7.25s
Total Time
All Input Parameters
{ "text": "Then I would never talk to that person about boa constrictors, or primeval forests, or stars. I would bring myself down to his level.", "cfg_weight": 0.5, "temperature": 0.8, "exaggeration": 0.5, "audio_prompt_path": "https://maskgct.github.io/audios/celeb_samples/rick_0.wav" }
Input Parameters
- text
- Text to synthesize
- cfg_weight
- Balances text fidelity and creativity; higher values make speech closer to the input text.
- temperature
- Adjusts randomness in speech generation; higher values produce more varied and natural output.
- exaggeration
- Controls how expressive or exaggerated the speech sounds; higher values increase emotional intensity.
- audio_prompt_path
- Reference audio file to clone
Output Schema
Output
Example Execution Logs
WARNING:root:Reference mel length is not equal to 2 * reference token length. Sampling: 0%| | 0/1000 [00:00<?, ?it/s] Sampling: 0%| | 2/1000 [00:00<00:58, 17.12it/s] Sampling: 1%| | 6/1000 [00:00<00:39, 25.40it/s] Sampling: 1%| | 10/1000 [00:00<00:34, 28.29it/s] Sampling: 1%|▏ | 14/1000 [00:00<00:33, 29.14it/s] Sampling: 2%|▏ | 18/1000 [00:00<00:33, 29.71it/s] Sampling: 2%|▏ | 22/1000 [00:00<00:31, 30.60it/s] Sampling: 3%|▎ | 26/1000 [00:00<00:31, 31.15it/s] Sampling: 3%|▎ | 30/1000 [00:01<00:31, 31.16it/s] Sampling: 3%|▎ | 34/1000 [00:01<00:31, 31.09it/s] Sampling: 4%|▍ | 38/1000 [00:01<00:30, 31.05it/s] Sampling: 4%|▍ | 42/1000 [00:01<00:30, 31.60it/s] Sampling: 5%|▍ | 46/1000 [00:01<00:30, 31.25it/s] Sampling: 5%|▌ | 50/1000 [00:01<00:30, 31.61it/s] Sampling: 5%|▌ | 54/1000 [00:01<00:29, 31.83it/s] Sampling: 6%|▌ | 58/1000 [00:01<00:29, 32.00it/s] Sampling: 6%|▌ | 62/1000 [00:02<00:29, 31.98it/s] Sampling: 7%|▋ | 66/1000 [00:02<00:29, 32.10it/s] Sampling: 7%|▋ | 70/1000 [00:02<00:29, 32.00it/s] Sampling: 7%|▋ | 74/1000 [00:02<00:28, 32.18it/s] Sampling: 8%|▊ | 78/1000 [00:02<00:29, 31.66it/s] Sampling: 8%|▊ | 82/1000 [00:02<00:28, 31.85it/s] Sampling: 9%|▊ | 86/1000 [00:02<00:28, 32.25it/s] Sampling: 9%|▉ | 90/1000 [00:02<00:28, 32.31it/s] Sampling: 9%|▉ | 94/1000 [00:03<00:28, 32.06it/s] Sampling: 10%|▉ | 98/1000 [00:03<00:27, 32.33it/s] Sampling: 10%|█ | 102/1000 [00:03<00:27, 32.26it/s] Sampling: 11%|█ | 106/1000 [00:03<00:27, 32.38it/s] Sampling: 11%|█ | 110/1000 [00:03<00:27, 32.14it/s] Sampling: 11%|█▏ | 114/1000 [00:03<00:27, 32.41it/s] Sampling: 12%|█▏ | 118/1000 [00:03<00:27, 32.64it/s] Sampling: 12%|█▏ | 122/1000 [00:03<00:27, 32.15it/s] Sampling: 13%|█▎ | 126/1000 [00:04<00:27, 32.24it/s] Sampling: 13%|█▎ | 130/1000 [00:04<00:26, 32.24it/s] Sampling: 13%|█▎ | 134/1000 [00:04<00:27, 31.22it/s] Sampling: 14%|█▍ | 138/1000 [00:04<00:27, 31.70it/s] Sampling: 14%|█▍ | 142/1000 [00:04<00:26, 31.80it/s] Sampling: 15%|█▍ | 146/1000 [00:04<00:26, 32.17it/s] Sampling: 15%|█▌ | 150/1000 [00:04<00:26, 31.71it/s] Sampling: 15%|█▌ | 154/1000 [00:04<00:26, 31.60it/s] Sampling: 16%|█▌ | 158/1000 [00:05<00:26, 31.53it/s] Sampling: 16%|█▌ | 162/1000 [00:05<00:26, 31.55it/s] Sampling: 17%|█▋ | 166/1000 [00:05<00:26, 31.52it/s] Sampling: 17%|█▋ | 170/1000 [00:05<00:26, 31.89it/s] Sampling: 17%|█▋ | 174/1000 [00:05<00:25, 31.97it/s] Sampling: 18%|█▊ | 178/1000 [00:05<00:25, 32.06it/s] Sampling: 18%|█▊ | 180/1000 [00:05<00:25, 31.54it/s]
Version Details
- Version ID
3f5f9c195086737dda710bf504330f71e786d0a361b505e377c8b10122af9d32
- Version Created
- May 30, 2025