platform-kit/mars5-tts 📝🔢🖼️🔊 → 🖼️
About
A novel speech model for insane prosody.

Example Output
Output
Performance Metrics
38.29s
Prediction Time
38.31s
Total Time
All Input Parameters
{ "text": "Introducing Mars5, a revolutionary open-source text-to-speech model.", "top_k": 100, "temperature": 1.1, "freq_penalty": 5, "ref_audio_file": "https://replicate.delivery/pbxt/L9a6SelzU0B2DIWeNpkNR0CKForWSbkswoUP69L0NLjLswVV/voice_sample.wav", "rep_penalty_window": 150, "ref_audio_transcript": "Hi there. I'm your new voice clone. Try your best to upload quality audio." }
Input Parameters
- text
- Text to synthesize
- top_k
- temperature
- freq_penalty
- ref_audio_file
- Reference audio file to clone from <= 10 seconds
- rep_penalty_window
- ref_audio_transcript
- Text in the reference audio file
Output Schema
Output
Example Execution Logs
>>> Running inference Note: using deep clone. Assuming input `c_phones` is concatenated prompt and output phones. Also assuming no padded indices in `c_codes`. New x: torch.Size([1, 1221, 8]) | new x_known: torch.Size([1, 1221, 8]) . Base prompt: torch.Size([1, 428, 8]). New padding mask: torch.Size([1, 1221]) | m shape: torch.Size([1, 1221, 8]) >>>>> Done with inference
Version Details
- Version ID
6aed0f11f3ba7b13d59ab3228355e7b1ea943479673cc57e10e99ba766536811
- Version Created
- June 25, 2024