ttsds/voicecraft 📝❓🖼️ → 🖼️

▶️ 512 runs 📅 Feb 2025 ⚙️ Cog 0.13.6
text-to-speech voice-cloning

About

Example Output

Output

Example output

Performance Metrics

79.02s Prediction Time
233.15s Total Time
All Input Parameters
{
  "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.",
  "version": "giga330m",
  "text_reference": "and keeping eternity before the eyes, though much.",
  "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav"
}
Input Parameters
text (required) Type: string
Text to synthesize
version Default: giga330m
Version of the model to use
text_reference (required) Type: string
Transcript of reference audio
speaker_reference (required) Type: string
Reference audio file
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
INFO     Setting up corpus information...
INFO     Loading corpus from source files...
   1%                                      1/100  [ 0:00:01 < -:--:-- , ? it/s ]
INFO     Found 1 speaker across 1 file, average number of utterances per
speaker: 1.0
INFO     Initializing multiprocessing jobs...
INFO     Normalizing text...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Generating MFCCs...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:12 < 0:00:00 , ? it/s ]
INFO     Calculating CMVN...
INFO     Generating final features...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Creating corpus split...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Compiling training graphs...
INFO     Performing first-pass alignment...
INFO     Generating alignments...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Calculating fMLLR for speaker adaptation...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Performing second-pass alignment...
INFO     Generating alignments...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:01 < 0:00:00 , ? it/s ]
INFO     Collecting phone and word alignments from alignment lattices...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:02 < 0:00:00 , ? it/s ]
WARNING  Alignment analysis not available without using postgresql
INFO     Exporting alignment TextGrids to /tmp/tmphk_hfkzo/mfa_alignments...
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1  [ 0:00:00 < 0:00:00 , ? it/s ]
INFO     Finished exporting TextGrids to /tmp/tmphk_hfkzo/mfa_alignments!
INFO     Done! Everything took 70.744 seconds
Version Details
Version ID
daee148e2e9d7cf3ae863c868f2ba23a15aa3dfba7700f49316f15028bbe7884
Version Created
March 21, 2025
Run on Replicate →