jichengdu/cosyvoice ❓📝🖼️ → 🖼️
About
CosyVoice2-0.5B-Scalable Streaming Speech Synthesis with Large Language Models

Example Output
Output
Performance Metrics
2.38s
Prediction Time
122.78s
Total Time
All Input Parameters
{ "task": "zero-shot voice clone", "tts_text": "白日依山尽,黄河入海流。", "instruction": "", "source_audio": "https://replicate.delivery/pbxt/MgbBQRAKfZkuc9EcspUou25Uxfdgc3xWS43kvqIla8eWBsaQ/zero_shot_prompt.wav", "source_transcript": "希望你以后能够做得比我还好哟!" }
Input Parameters
- task
- Task type / 任务类型:零样本声音克隆、跨语言声音克隆或指令式声音生成
- tts_text (required)
- Text of the audio to generate / 要生成的音频文本内容
- instruction
- Instruction for Instructed Voice Generation task / 指令式声音生成任务的指令内容
- source_audio (required)
- Source audio / 源音频文件(参考来源)
- source_transcript (required)
- Transcript of the source audio / 源音频(参考来源)的文字内容
Output Schema
Output
Example Execution Logs
0%| | 0/1 [00:00<?, ?it/s]2025-03-19 10:41:32,648 INFO synthesis text 白日依山尽,黄河入海流。 2025-03-19 10:41:34,538 INFO yield speech len 4.04, rtf 0.4677236080169678 100%|██████████| 1/1 [00:02<00:00, 2.20s/it] 100%|██████████| 1/1 [00:02<00:00, 2.20s/it]
Version Details
- Version ID
4106862b8e948847f2d7e1513eb1b03e7bd07333343dda94c83cf78d82eb3f1d
- Version Created
- March 19, 2025