platform-kit/mars5-tts 📝🔢🖼️🔊 → 🖼️

▶️ 535 runs 📅 Jun 2024 ⚙️ Cog 0.8.6 🔗 GitHub ⚖️ License

text-to-speech voice-cloning

About

A novel speech model for insane prosody.

Example Output

Output

Performance Metrics

38.29s Prediction Time

38.31s Total Time

All Input Parameters

{
  "text": "Introducing Mars5, a revolutionary open-source text-to-speech model.",
  "top_k": 100,
  "temperature": 1.1,
  "freq_penalty": 5,
  "ref_audio_file": "https://replicate.delivery/pbxt/L9a6SelzU0B2DIWeNpkNR0CKForWSbkswoUP69L0NLjLswVV/voice_sample.wav",
  "rep_penalty_window": 150,
  "ref_audio_transcript": "Hi there. I'm your new voice clone. Try your best to upload quality audio."
}

Input Parameters

text Type: stringDefault: Hi there, I'm your new voice clone, powered by Mars5.: Text to synthesize
top_k Type: integerDefault: 95Range: 0 - 100
temperature Type: numberDefault: 0.5Range: 0 - 5
freq_penalty Type: integerDefault: 3
ref_audio_file Type: stringDefault: https://replicate.delivery/pbxt/L9a6SelzU0B2DIWeNpkNR0CKForWSbkswoUP69L0NLjLswVV/voice_sample.wav: Reference audio file to clone from <= 10 seconds
rep_penalty_window Type: integerDefault: 95
ref_audio_transcript Type: stringDefault: Hi there. I'm your new voice clone. Try your best to upload quality audio.: Text in the reference audio file

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

>>> Running inference
Note: using deep clone. Assuming input `c_phones` is concatenated prompt and output phones. Also assuming no padded indices in `c_codes`.
New x: torch.Size([1, 1221, 8]) | new x_known: torch.Size([1, 1221, 8]) . Base prompt: torch.Size([1, 428, 8]). New padding mask: torch.Size([1, 1221]) | m shape: torch.Size([1, 1221, 8])
>>>>> Done with inference

Version Details

Version ID: 6aed0f11f3ba7b13d59ab3228355e7b1ea943479673cc57e10e99ba766536811
Version Created: June 25, 2024

Run on Replicate →