camb-ai/mars5-tts 📝🔢🖼️🔊 → 🖼️

▶️ 680 runs 📅 Jun 2024 ⚙️ Cog 0.9.9 🔗 GitHub ⚖️ License

text-to-speech voice-cloning

About

MARS5, a fully open-source (commercially usable) voice-cloning/TTS with break-through prosody and realism.

Example Output

Output

Performance Metrics

112.56s Prediction Time

230.44s Total Time

All Input Parameters

{
  "text": "Hey there, this is a test.",
  "ref_audio_file": "https://files.catbox.moe/be6df3.wav",
  "ref_audio_transcript": "We actually haven't managed to meet demand."
}

Input Parameters

text Type: stringDefault: We actually haven't managed to meet demand.: Text to synthesize
top_k Type: integerDefault: 95Range: 0 - 100
temperature Type: numberDefault: 0.5Range: 0 - 5
freq_penalty Type: integerDefault: 3
ref_audio_file Type: stringDefault: https://storage.googleapis.com/cambai-prod-bucket/be6df3.wav: Reference audio file to clone from <= 10 seconds
rep_penalty_window Type: integerDefault: 95
ref_audio_transcript Type: stringDefault: Hi there. How are you doing?: Text in the reference audio file

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

>>>> Ref Audio file: /tmp/tmpdb0qyxi0be6df3.wav; ref_transcript: We actually haven't managed to meet demand.
>>> Running inference
Note: using deep clone. Assuming input `c_phones` is concatenated prompt and output phones. Also assuming no padded indices in `c_codes`.
New x: torch.Size([1, 990, 8]) | new x_known: torch.Size([1, 990, 8]) . Base prompt: torch.Size([1, 215, 8]). New padding mask: torch.Size([1, 990]) | m shape: torch.Size([1, 990, 8])
>>>>> Done with inference
[16384/717228] bytes |>                    |
[32768/717228] bytes |>                    |
[49152/717228] bytes |=>                   |
[65536/717228] bytes |=>                   |
[81920/717228] bytes |==>                  |
[98304/717228] bytes |==>                  |
[114688/717228] bytes |===>                 |
[131072/717228] bytes |===>                 |
[147456/717228] bytes |====>                |
[163840/717228] bytes |====>                |
[180224/717228] bytes |=====>               |
[196608/717228] bytes |=====>               |
[212992/717228] bytes |=====>               |
[229376/717228] bytes |======>              |
[245760/717228] bytes |======>              |
[262144/717228] bytes |=======>             |
[278528/717228] bytes |=======>             |
[294912/717228] bytes |========>            |
[311296/717228] bytes |========>            |
[327680/717228] bytes |=========>           |
[344064/717228] bytes |=========>           |
[360448/717228] bytes |==========>          |
[376832/717228] bytes |==========>          |
[393216/717228] bytes |==========>          |
[409600/717228] bytes |===========>         |
[425984/717228] bytes |===========>         |
[442368/717228] bytes |============>        |
[458752/717228] bytes |============>        |
[475136/717228] bytes |=============>       |
[491520/717228] bytes |=============>       |
[507904/717228] bytes |==============>      |
[524288/717228] bytes |==============>      |
[540672/717228] bytes |===============>     |
[557056/717228] bytes |===============>     |
[573440/717228] bytes |===============>     |
[589824/717228] bytes |================>    |
[606208/717228] bytes |================>    |
[622592/717228] bytes |=================>   |
[638976/717228] bytes |=================>   |
[655360/717228] bytes |==================>  |
[671744/717228] bytes |==================>  |
[688128/717228] bytes |===================> |
[704512/717228] bytes |===================> |
[717228/717228] bytes |====================>|
[717228/717228] bytes |====================>|>>>> Output file url: https://files.catbox.moe/ulk1ao.wav

Version Details

Version ID: 0ddbf483082c507b102318ee81d42ba55fd2cfdb42589e5f24e39fae5dea86ae
Version Created: July 11, 2024

Run on Replicate →