chenxwh/cosyvoice2-0.5b ❓📝🖼️ → 🖼️
About
Scalable Streaming Speech Synthesis with Large Language Models

Example Output
Output
Performance Metrics
5.97s
Prediction Time
42.57s
Total Time
All Input Parameters
{ "task": "zero-shot voice clone", "tts_text": "Every stage is a fresh adventure, and as the lights ignite, it's an unspoken pact between me and the audience, weaving unforgettable nights where dreams meet reality.", "instruction": "", "source_audio": "https://replicate.delivery/pbxt/MCyjoMjdC1WlvhMHzNhylKOrz97Vy0dFRM8ciNtq5siWG3pj/En_3_prompt.wav", "source_transcript": "I'm so happy I got to do this. I really wanted to work with Tom Hooper. I know that he records live and he films and records your vocals live. It's such an interesting thing to me and I wanted to see him work. I had actually done screen tests for Les Mis." }
Input Parameters
- task
- tts_text (required)
- Text of the audio to generate
- instruction
- Instruction for Instructed Voice Generation task
- source_audio (required)
- Source audio
- source_transcript (required)
- Transcript of the source audio, you can use models such as whisper to transcribe first
Output Schema
Output
Example Execution Logs
0%| | 0/1 [00:00<?, ?it/s]2024-12-25 23:56:15,104 INFO synthesis text Every stage is a fresh adventure, and as the lights ignite, it's an unspoken pact between me and the audience, weaving unforgettable nights where dreams meet reality. 2024-12-25 23:56:19,660 INFO yield speech len 8.4, rtf 0.5424436784925915 100%|██████████| 1/1 [00:05<00:00, 5.49s/it] 100%|██████████| 1/1 [00:05<00:00, 5.49s/it]
Version Details
- Version ID
669b1cd618f2747d2237350e868f5c313f3b548fc803ca4e57adfaba778b042d
- Version Created
- December 25, 2024