cuuupid/zonos 📝🖼️ → 🖼️

▶️ 267 runs 📅 Feb 2025 ⚙️ Cog 0.13.7 🔗 GitHub 📄 Paper ⚖️ License
text-to-speech voice-cloning

About

Zonos-v0.1 beta, a SOTA text-to-speech Transformer model with extraordinary expressive range, built by Zyphra.

Example Output

Output

Example output

Performance Metrics

53.43s Prediction Time
100.62s Total Time
All Input Parameters
{
  "text": "I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Respect me and I'll nurture you; ignore me and you shall face the consequences.",
  "audio": "https://replicate.delivery/pbxt/MTiggYvvLjNJAZngjPgl0IzZ1x07SRbC4m3l3y6h4D3ih1Gl/Mel_Original_MoveFirst_2.mp3"
}
Input Parameters
text (required) Type: string
Text to speak!
audio Type: string
(Optional) Audio with voice to mimic
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
0%|          | 0/2865 [00:00<?, ?it/s]
  0%|          | 1/2865 [00:00<14:18,  3.34it/s]
  1%|          | 17/2865 [00:00<00:53, 53.23it/s]
  1%|          | 33/2865 [00:00<00:33, 85.62it/s]
  2%|▏         | 49/2865 [00:00<00:26, 107.18it/s]
  2%|▏         | 65/2865 [00:00<00:23, 121.69it/s]
  3%|▎         | 81/2865 [00:00<00:21, 131.55it/s]
  3%|▎         | 97/2865 [00:00<00:20, 138.38it/s]
  4%|▍         | 113/2865 [00:01<00:19, 143.09it/s]
  5%|▍         | 129/2865 [00:01<00:18, 146.27it/s]
  5%|▌         | 145/2865 [00:01<00:18, 148.48it/s]
  6%|▌         | 161/2865 [00:01<00:18, 149.97it/s]
  6%|▌         | 177/2865 [00:01<00:17, 151.01it/s]
  7%|▋         | 193/2865 [00:01<00:17, 151.73it/s]
  7%|▋         | 209/2865 [00:01<00:17, 152.20it/s]
  8%|▊         | 225/2865 [00:01<00:17, 152.45it/s]
  8%|▊         | 241/2865 [00:01<00:17, 152.59it/s]
  9%|▉         | 257/2865 [00:01<00:17, 152.66it/s]
 10%|▉         | 273/2865 [00:02<00:16, 152.70it/s]
 10%|█         | 289/2865 [00:02<00:16, 152.71it/s]
 11%|█         | 305/2865 [00:02<00:16, 152.74it/s]
 11%|█         | 321/2865 [00:02<00:16, 152.76it/s]
 12%|█▏        | 337/2865 [00:02<00:16, 152.75it/s]
 12%|█▏        | 353/2865 [00:02<00:16, 152.73it/s]
 13%|█▎        | 369/2865 [00:02<00:16, 152.64it/s]
 13%|█▎        | 385/2865 [00:02<00:16, 152.39it/s]
 14%|█▍        | 401/2865 [00:02<00:16, 152.20it/s]
 15%|█▍        | 417/2865 [00:03<00:16, 152.19it/s]
 15%|█▌        | 433/2865 [00:03<00:15, 152.17it/s]
 16%|█▌        | 449/2865 [00:03<00:15, 152.14it/s]
 16%|█▌        | 465/2865 [00:03<00:15, 152.09it/s]
 17%|█▋        | 481/2865 [00:03<00:15, 152.09it/s]
 17%|█▋        | 497/2865 [00:03<00:15, 152.05it/s]
 18%|█▊        | 513/2865 [00:03<00:15, 151.95it/s]
 18%|█▊        | 529/2865 [00:03<00:15, 151.73it/s]
 19%|█▉        | 545/2865 [00:03<00:15, 151.61it/s]
 20%|█▉        | 561/2865 [00:03<00:15, 151.56it/s]
 20%|██        | 577/2865 [00:04<00:15, 151.51it/s]
 21%|██        | 593/2865 [00:04<00:14, 151.49it/s]
 21%|██▏       | 609/2865 [00:04<00:14, 151.42it/s]
 22%|██▏       | 625/2865 [00:04<00:14, 151.32it/s]
 22%|██▏       | 641/2865 [00:04<00:14, 151.11it/s]
 23%|██▎       | 657/2865 [00:04<00:14, 150.95it/s]
 23%|██▎       | 673/2865 [00:04<00:14, 150.89it/s]
 24%|██▍       | 689/2865 [00:04<00:14, 150.80it/s]
 25%|██▍       | 705/2865 [00:04<00:14, 150.76it/s]
 25%|██▌       | 721/2865 [00:05<00:14, 150.73it/s]
 26%|██▌       | 737/2865 [00:05<00:14, 150.63it/s]
 26%|██▋       | 753/2865 [00:05<00:14, 150.53it/s]
 27%|██▋       | 769/2865 [00:05<00:13, 150.41it/s]
 27%|██▋       | 785/2865 [00:05<00:13, 150.31it/s]
 28%|██▊       | 801/2865 [00:05<00:13, 150.24it/s]
 29%|██▊       | 817/2865 [00:05<00:13, 150.15it/s]
 29%|██▉       | 833/2865 [00:05<00:13, 150.01it/s]
 30%|██▉       | 849/2865 [00:05<00:13, 149.94it/s]
 30%|███       | 864/2865 [00:05<00:13, 149.88it/s]
 31%|███       | 879/2865 [00:06<00:13, 149.82it/s]
 31%|███       | 894/2865 [00:06<00:13, 149.44it/s]
 32%|███▏      | 909/2865 [00:06<00:13, 149.37it/s]
 32%|███▏      | 924/2865 [00:06<00:12, 149.35it/s]
 33%|███▎      | 939/2865 [00:06<00:12, 149.34it/s]
 33%|███▎      | 954/2865 [00:06<00:12, 149.31it/s]
 34%|███▍      | 969/2865 [00:06<00:12, 149.22it/s]
 34%|███▍      | 984/2865 [00:06<00:12, 149.04it/s]
 35%|███▍      | 999/2865 [00:06<00:12, 149.01it/s]
 35%|███▌      | 1014/2865 [00:06<00:12, 148.89it/s]
 36%|███▌      | 1029/2865 [00:07<00:12, 148.80it/s]
 36%|███▋      | 1044/2865 [00:07<00:12, 148.68it/s]
 37%|███▋      | 1059/2865 [00:07<00:12, 148.61it/s]
 37%|███▋      | 1074/2865 [00:07<00:12, 148.55it/s]
 38%|███▊      | 1089/2865 [00:07<00:11, 148.51it/s]
 39%|███▊      | 1104/2865 [00:07<00:11, 148.46it/s]
 39%|███▉      | 1119/2865 [00:07<00:11, 148.35it/s]
 40%|███▉      | 1134/2865 [00:07<00:11, 148.17it/s]
 40%|████      | 1149/2865 [00:07<00:11, 148.04it/s]
 41%|████      | 1164/2865 [00:07<00:11, 147.95it/s]
 41%|████      | 1179/2865 [00:08<00:11, 147.87it/s]
 42%|████▏     | 1194/2865 [00:08<00:11, 147.76it/s]
 42%|████▏     | 1209/2865 [00:08<00:11, 147.69it/s]
 43%|████▎     | 1224/2865 [00:08<00:11, 147.64it/s]
 43%|████▎     | 1239/2865 [00:08<00:11, 147.59it/s]
 44%|████▍     | 1254/2865 [00:08<00:10, 147.53it/s]
 44%|████▍     | 1269/2865 [00:08<00:10, 147.23it/s]
 45%|████▍     | 1284/2865 [00:08<00:10, 147.07it/s]
 45%|████▌     | 1299/2865 [00:08<00:10, 147.03it/s]
46%|████▌     | 1311/2865 [00:09<00:10, 145.63it/s]
Version Details
Version ID
c86319441e6805516974afc719860dbba372cf1e2466997dcb67e259bc47522e
Version Created
February 11, 2025
Run on Replicate →