chenxwh/cosyvoice2-0.5b ❓📝🖼️ → 🖼️

▶️ 6.3K runs 📅 Dec 2024 ⚙️ Cog 0.9.23 🔗 GitHub 📄 Paper ⚖️ License
speech-style-transfer text-to-speech voice-cloning

About

Scalable Streaming Speech Synthesis with Large Language Models

Example Output

Output

Example output

Performance Metrics

5.97s Prediction Time
42.57s Total Time
All Input Parameters
{
  "task": "zero-shot voice clone",
  "tts_text": "Every stage is a fresh adventure, and as the lights ignite, it's an unspoken pact between me and the audience, weaving unforgettable nights where dreams meet reality.",
  "instruction": "",
  "source_audio": "https://replicate.delivery/pbxt/MCyjoMjdC1WlvhMHzNhylKOrz97Vy0dFRM8ciNtq5siWG3pj/En_3_prompt.wav",
  "source_transcript": "I'm so happy I got to do this. I really wanted to work with Tom Hooper. I know that he records live and he films and records your vocals live. It's such an interesting thing to me and I wanted to see him work. I had actually done screen tests for Les Mis."
}
Input Parameters
task Default: zero-shot voice clone
tts_text (required) Type: string
Text of the audio to generate
instruction Type: stringDefault:
Instruction for Instructed Voice Generation task
source_audio (required) Type: string
Source audio
source_transcript (required) Type: string
Transcript of the source audio, you can use models such as whisper to transcribe first
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
0%|          | 0/1 [00:00<?, ?it/s]2024-12-25 23:56:15,104 INFO synthesis text Every stage is a fresh adventure, and as the lights ignite, it's an unspoken pact between me and the audience, weaving unforgettable nights where dreams meet reality.
2024-12-25 23:56:19,660 INFO yield speech len 8.4, rtf 0.5424436784925915
100%|██████████| 1/1 [00:05<00:00,  5.49s/it]
100%|██████████| 1/1 [00:05<00:00,  5.49s/it]
Version Details
Version ID
669b1cd618f2747d2237350e868f5c313f3b548fc803ca4e57adfaba778b042d
Version Created
December 25, 2024
Run on Replicate →