platform-kit/mars5-tts 📝🔢🖼️🔊 → 🖼️

▶️ 516 runs 📅 Jun 2024 ⚙️ Cog 0.8.6 🔗 GitHub ⚖️ License
text-to-speech voice-cloning

About

A novel speech model for insane prosody.

Example Output

Output

Example output

Performance Metrics

38.29s Prediction Time
38.31s Total Time
All Input Parameters
{
  "text": "Introducing Mars5, a revolutionary open-source text-to-speech model.",
  "top_k": 100,
  "temperature": 1.1,
  "freq_penalty": 5,
  "ref_audio_file": "https://replicate.delivery/pbxt/L9a6SelzU0B2DIWeNpkNR0CKForWSbkswoUP69L0NLjLswVV/voice_sample.wav",
  "rep_penalty_window": 150,
  "ref_audio_transcript": "Hi there. I'm your new voice clone. Try your best to upload quality audio."
}
Input Parameters
text Type: stringDefault: Hi there, I'm your new voice clone, powered by Mars5.
Text to synthesize
top_k Type: integerDefault: 95Range: 0 - 100
temperature Type: numberDefault: 0.5Range: 0 - 5
freq_penalty Type: integerDefault: 3
ref_audio_file Type: stringDefault: https://replicate.delivery/pbxt/L9a6SelzU0B2DIWeNpkNR0CKForWSbkswoUP69L0NLjLswVV/voice_sample.wav
Reference audio file to clone from <= 10 seconds
rep_penalty_window Type: integerDefault: 95
ref_audio_transcript Type: stringDefault: Hi there. I'm your new voice clone. Try your best to upload quality audio.
Text in the reference audio file
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
>>> Running inference
Note: using deep clone. Assuming input `c_phones` is concatenated prompt and output phones. Also assuming no padded indices in `c_codes`.
New x: torch.Size([1, 1221, 8]) | new x_known: torch.Size([1, 1221, 8]) . Base prompt: torch.Size([1, 428, 8]). New padding mask: torch.Size([1, 1221]) | m shape: torch.Size([1, 1221, 8])
>>>>> Done with inference
Version Details
Version ID
6aed0f11f3ba7b13d59ab3228355e7b1ea943479673cc57e10e99ba766536811
Version Created
June 25, 2024
Run on Replicate →