jichengdu/cosyvoice ❓📝🖼️ → 🖼️

▶️ 1.8K runs 📅 Jan 2025 ⚙️ Cog 0.14.2 🔗 GitHub 📄 Paper ⚖️ License

multilingual speech-style-transfer text-to-speech voice-cloning

Performance

2.4sTypical run time

~123sCold start (first call)

1.8KTotal runs

About

CosyVoice2-0.5B-Scalable Streaming Speech Synthesis with Large Language Models

Example Output

Output

Performance Metrics

2.38s Prediction Time

122.78s Total Time

All Input Parameters

{
  "task": "zero-shot voice clone",
  "tts_text": "白日依山尽，黄河入海流。",
  "instruction": "",
  "source_audio": "https://replicate.delivery/pbxt/MgbBQRAKfZkuc9EcspUou25Uxfdgc3xWS43kvqIla8eWBsaQ/zero_shot_prompt.wav",
  "source_transcript": "希望你以后能够做得比我还好哟！"
}

Input Parameters

task Default: zero-shot voice clone: Task type / 任务类型：零样本声音克隆、跨语言声音克隆或指令式声音生成
tts_text (required) Type: string: Text of the audio to generate / 要生成的音频文本内容
instruction Type: stringDefault:: Instruction for Instructed Voice Generation task / 指令式声音生成任务的指令内容
source_audio (required) Type: string: Source audio / 源音频文件（参考来源）
source_transcript (required) Type: string: Transcript of the source audio / 源音频（参考来源）的文字内容

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

0%|          | 0/1 [00:00<?, ?it/s]2025-03-19 10:41:32,648 INFO synthesis text 白日依山尽，黄河入海流。
2025-03-19 10:41:34,538 INFO yield speech len 4.04, rtf 0.4677236080169678
100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
100%|██████████| 1/1 [00:02<00:00,  2.20s/it]

Version Details

Version ID: 4106862b8e948847f2d7e1513eb1b03e7bd07333343dda94c83cf78d82eb3f1d
Version Created: March 19, 2025

Run on Replicate →