jichengdu/cosyvoice ❓📝🖼️ → 🖼️

▶️ 1.7K runs 📅 Jan 2025 ⚙️ Cog 0.14.2 🔗 GitHub 📄 Paper ⚖️ License
multilingual text-to-speech voice-cloning

About

CosyVoice2-0.5B-Scalable Streaming Speech Synthesis with Large Language Models

Example Output

Output

Example output

Performance Metrics

2.38s Prediction Time
122.78s Total Time
All Input Parameters
{
  "task": "zero-shot voice clone",
  "tts_text": "白日依山尽,黄河入海流。",
  "instruction": "",
  "source_audio": "https://replicate.delivery/pbxt/MgbBQRAKfZkuc9EcspUou25Uxfdgc3xWS43kvqIla8eWBsaQ/zero_shot_prompt.wav",
  "source_transcript": "希望你以后能够做得比我还好哟!"
}
Input Parameters
task Default: zero-shot voice clone
Task type / 任务类型:零样本声音克隆、跨语言声音克隆或指令式声音生成
tts_text (required) Type: string
Text of the audio to generate / 要生成的音频文本内容
instruction Type: stringDefault:
Instruction for Instructed Voice Generation task / 指令式声音生成任务的指令内容
source_audio (required) Type: string
Source audio / 源音频文件(参考来源)
source_transcript (required) Type: string
Transcript of the source audio / 源音频(参考来源)的文字内容
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
0%|          | 0/1 [00:00<?, ?it/s]2025-03-19 10:41:32,648 INFO synthesis text 白日依山尽,黄河入海流。
2025-03-19 10:41:34,538 INFO yield speech len 4.04, rtf 0.4677236080169678
100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
Version Details
Version ID
4106862b8e948847f2d7e1513eb1b03e7bd07333343dda94c83cf78d82eb3f1d
Version Created
March 19, 2025
Run on Replicate →