camb-ai/mars5-tts 📝🔢🖼️🔊 → 🖼️
About
MARS5, a fully open-source (commercially usable) voice-cloning/TTS with break-through prosody and realism.

Example Output
Output
Performance Metrics
112.56s
Prediction Time
230.44s
Total Time
All Input Parameters
{ "text": "Hey there, this is a test.", "ref_audio_file": "https://files.catbox.moe/be6df3.wav", "ref_audio_transcript": "We actually haven't managed to meet demand." }
Input Parameters
- text
- Text to synthesize
- top_k
- temperature
- freq_penalty
- ref_audio_file
- Reference audio file to clone from <= 10 seconds
- rep_penalty_window
- ref_audio_transcript
- Text in the reference audio file
Output Schema
Output
Example Execution Logs
>>>> Ref Audio file: /tmp/tmpdb0qyxi0be6df3.wav; ref_transcript: We actually haven't managed to meet demand. >>> Running inference Note: using deep clone. Assuming input `c_phones` is concatenated prompt and output phones. Also assuming no padded indices in `c_codes`. New x: torch.Size([1, 990, 8]) | new x_known: torch.Size([1, 990, 8]) . Base prompt: torch.Size([1, 215, 8]). New padding mask: torch.Size([1, 990]) | m shape: torch.Size([1, 990, 8]) >>>>> Done with inference [16384/717228] bytes |> | [32768/717228] bytes |> | [49152/717228] bytes |=> | [65536/717228] bytes |=> | [81920/717228] bytes |==> | [98304/717228] bytes |==> | [114688/717228] bytes |===> | [131072/717228] bytes |===> | [147456/717228] bytes |====> | [163840/717228] bytes |====> | [180224/717228] bytes |=====> | [196608/717228] bytes |=====> | [212992/717228] bytes |=====> | [229376/717228] bytes |======> | [245760/717228] bytes |======> | [262144/717228] bytes |=======> | [278528/717228] bytes |=======> | [294912/717228] bytes |========> | [311296/717228] bytes |========> | [327680/717228] bytes |=========> | [344064/717228] bytes |=========> | [360448/717228] bytes |==========> | [376832/717228] bytes |==========> | [393216/717228] bytes |==========> | [409600/717228] bytes |===========> | [425984/717228] bytes |===========> | [442368/717228] bytes |============> | [458752/717228] bytes |============> | [475136/717228] bytes |=============> | [491520/717228] bytes |=============> | [507904/717228] bytes |==============> | [524288/717228] bytes |==============> | [540672/717228] bytes |===============> | [557056/717228] bytes |===============> | [573440/717228] bytes |===============> | [589824/717228] bytes |================> | [606208/717228] bytes |================> | [622592/717228] bytes |=================> | [638976/717228] bytes |=================> | [655360/717228] bytes |==================> | [671744/717228] bytes |==================> | [688128/717228] bytes |===================> | [704512/717228] bytes |===================> | [717228/717228] bytes |====================>| [717228/717228] bytes |====================>|>>>> Output file url: https://files.catbox.moe/ulk1ao.wav
Version Details
- Version ID
0ddbf483082c507b102318ee81d42ba55fd2cfdb42589e5f24e39fae5dea86ae
- Version Created
- July 11, 2024