ttsds/voicecraft 📝❓🖼️ → 🖼️

▶️ 524 runs 📅 Feb 2025 ⚙️ Cog 0.13.6

text-to-speech voice-cloning

Performance

79.0sTypical run time

~233sCold start (first call)

524Total runs

About

Example Output

Output

Performance Metrics

79.02s Prediction Time

233.15s Total Time

All Input Parameters

{
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "version": "giga330m",
  "text_reference": "and keeping eternity before the eyes, though much.",
  "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}

Input Parameters

text (required) Type: string: Text to synthesize
version Default: giga330m: Version of the model to use
text_reference (required) Type: string: Transcript of reference audio
speaker_reference (required) Type: string: Reference audio file

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

INFO     Setting up corpus information...
INFO     Loading corpus from source files...
   1%                                      1/100  [ 0:00:01 < -:--:-- , ? it/s ]
INFO     Found 1 speaker across 1 file, average number of utterances per
speaker: 1.0
INFO     Initializing multiprocessing jobs...
INFO     Normalizing text...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Generating MFCCs...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:12 < 0:00:00 , ? it/s ]
INFO     Calculating CMVN...
INFO     Generating final features...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Creating corpus split...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Compiling training graphs...
INFO     Performing first-pass alignment...
INFO     Generating alignments...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Calculating fMLLR for speaker adaptation...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Performing second-pass alignment...
INFO     Generating alignments...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Collecting phone and word alignments from alignment lattices...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:02 < 0:00:00 , ? it/s ]
WARNING  Alignment analysis not available without using postgresql
INFO     Exporting alignment TextGrids to /tmp/tmphk_hfkzo/mfa_alignments...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:00 < 0:00:00 , ? it/s ]
INFO     Finished exporting TextGrids to /tmp/tmphk_hfkzo/mfa_alignments!
INFO     Done! Everything took 70.744 seconds

Version Details

Version ID: daee148e2e9d7cf3ae863c868f2ba23a15aa3dfba7700f49316f15028bbe7884
Version Created: March 21, 2025

Run on Replicate →