chenxwh/cosyvoice2-0.5b ❓📝🖼️ → 🖼️

▶️ 6.3K runs 📅 Dec 2024 ⚙️ Cog 0.9.23 🔗 GitHub 📄 Paper ⚖️ License

speech-style-transfer text-to-speech voice-cloning

About

Scalable Streaming Speech Synthesis with Large Language Models

Example Output

Output

Performance Metrics

5.97s Prediction Time

42.57s Total Time

All Input Parameters

{
  "task": "zero-shot voice clone",
  "tts_text": "Every stage is a fresh adventure, and as the lights ignite, it's an unspoken pact between me and the audience, weaving unforgettable nights where dreams meet reality.",
  "instruction": "",
  "source_audio": "https://replicate.delivery/pbxt/MCyjoMjdC1WlvhMHzNhylKOrz97Vy0dFRM8ciNtq5siWG3pj/En_3_prompt.wav",
  "source_transcript": "I'm so happy I got to do this. I really wanted to work with Tom Hooper. I know that he records live and he films and records your vocals live. It's such an interesting thing to me and I wanted to see him work. I had actually done screen tests for Les Mis."
}

Input Parameters

task Default: zero-shot voice clone
tts_text (required) Type: string: Text of the audio to generate
instruction Type: stringDefault:: Instruction for Instructed Voice Generation task
source_audio (required) Type: string: Source audio
source_transcript (required) Type: string: Transcript of the source audio, you can use models such as whisper to transcribe first

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

0%|          | 0/1 [00:00<?, ?it/s]2024-12-25 23:56:15,104 INFO synthesis text Every stage is a fresh adventure, and as the lights ignite, it's an unspoken pact between me and the audience, weaving unforgettable nights where dreams meet reality.
2024-12-25 23:56:19,660 INFO yield speech len 8.4, rtf 0.5424436784925915
100%|██████████| 1/1 [00:05<00:00,  5.49s/it]
100%|██████████| 1/1 [00:05<00:00,  5.49s/it]

Version Details

Version ID: 669b1cd618f2747d2237350e868f5c313f3b548fc803ca4e57adfaba778b042d
Version Created: December 25, 2024

Run on Replicate →