ttsds/metavoice 📝🖼️ → 🖼️

▶️ 646 runs 📅 Jan 2025 ⚙️ Cog 0.13.6
text-to-speech voice-cloning

About

Example Output

Output

Example output

Performance Metrics

12.37s Prediction Time
272.82s Total Time
All Input Parameters
{
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Input Parameters
text (required) Type: string
speaker_reference (required) Type: string
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
0%|          | 0/2005 [00:00<?, ?it/s]
  1%|▏         | 30/2005 [00:00<00:06, 294.61it/s]
  3%|▎         | 60/2005 [00:00<00:06, 290.72it/s]
  4%|▍         | 90/2005 [00:00<00:06, 289.47it/s]
  6%|▌         | 119/2005 [00:00<00:06, 288.95it/s]
  7%|▋         | 148/2005 [00:00<00:06, 288.68it/s]
  9%|▉         | 177/2005 [00:00<00:06, 288.49it/s]
 10%|█         | 206/2005 [00:00<00:06, 288.35it/s]
 12%|█▏        | 235/2005 [00:00<00:06, 288.35it/s]
 13%|█▎        | 264/2005 [00:00<00:06, 288.20it/s]
 15%|█▍        | 293/2005 [00:01<00:05, 288.18it/s]
 16%|█▌        | 322/2005 [00:01<00:05, 288.20it/s]
 18%|█▊        | 351/2005 [00:01<00:05, 288.10it/s]
 19%|█▉        | 380/2005 [00:01<00:05, 287.93it/s]
 20%|██        | 409/2005 [00:01<00:05, 287.99it/s]
 22%|██▏       | 438/2005 [00:01<00:05, 287.91it/s]
 23%|██▎       | 467/2005 [00:01<00:05, 287.78it/s]
 25%|██▍       | 496/2005 [00:01<00:05, 287.88it/s]
 26%|██▌       | 525/2005 [00:01<00:05, 287.89it/s]
 28%|██▊       | 554/2005 [00:01<00:05, 287.99it/s]
 29%|██▉       | 583/2005 [00:02<00:04, 288.01it/s]
 31%|███       | 612/2005 [00:02<00:04, 287.97it/s]
 32%|███▏      | 641/2005 [00:02<00:04, 287.99it/s]
 33%|███▎      | 670/2005 [00:02<00:04, 288.03it/s]
 35%|███▍      | 699/2005 [00:02<00:04, 288.01it/s]
 36%|███▋      | 728/2005 [00:02<00:04, 287.94it/s]
 38%|███▊      | 757/2005 [00:02<00:04, 287.94it/s]
 39%|███▉      | 786/2005 [00:02<00:04, 287.91it/s]
40%|███▉      | 798/2005 [00:02<00:04, 287.90it/s]
Time for 1st stage LLM inference: 2.78 sec total, 287.29 tokens/sec
Bandwidth achieved: 717.34 GB/s
Memory used: 8.94 GB
Non-causal batching:   0%|          | 0/1 [00:00<?, ?it/s]
Non-causal batching: 100%|██████████| 1/1 [00:00<00:00,  7.63it/s]
Non-causal batching: 100%|██████████| 1/1 [00:00<00:00,  7.62it/s]
2025-01-31 09:15:24 | WARNING  | DF | Audio sampling rate does not match model sampling rate (24000, 48000). Resampling...
Saved audio to /src/outputs/synth_25-01-31--09-15-24_With_tenure,_Suzie'd_have_ff59dbfc-89c0-4f09-8c4b-a815958f5cfd.wav
Total time to synth (s): 6.059656143188477
Real-time factor: 1.14
Version Details
Version ID
3495610f45204d13509ef709586d9badd3bc4bd895aa712a252b249df6693143
Version Created
January 31, 2025
Run on Replicate →