qwen/qwen3-tts ❓📝🖼️ → 🖼️

⭐ Official ▶️ 95.5K runs 📅 Jan 2026 ⚙️ Cog 0.16.9 ⚖️ License
multilingual text-to-speech voice-cloning

About

A unified Text-to-Speech demo featuring three powerful modes: Voice, Clone and Design

Example Output

Output

Example output

Performance Metrics

3.08s Prediction Time
3.09s Total Time
All Input Parameters
{
  "mode": "custom_voice",
  "text": "Hello, I'm Aiden and it's very nice to meet you",
  "speaker": "Aiden",
  "language": "auto"
}
Input Parameters
mode Default: custom_voice
TTS mode: 'custom_voice' uses preset speakers, 'voice_clone' clones from reference audio, 'voice_design' creates voice from description
text (required) Type: string
Text to synthesize into speech
speaker Default: Serena
Preset speaker voice (only for 'custom_voice' mode)
language Default: auto
Language of the text (use 'auto' for automatic detection)
reference_text Type: string
Transcript of the reference audio (recommended for 'voice_clone' mode)
reference_audio Type: string
Reference audio file for voice cloning (only for 'voice_clone' mode)
style_instruction Type: string
Optional style/emotion instruction (e.g., 'speak slowly and calmly', 'excited tone')
voice_description Type: string
Natural language description of desired voice (only for 'voice_design' mode). Example: 'A warm, friendly female voice with a slight British accent'
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Setting `pad_token_id` to `eos_token_id`:2150 for open-end generation.
Version Details
Version ID
501be1210291d541fb5656bbe4808e6290470741029a34004f19e20f6d2365e8
Version Created
January 23, 2026
Run on Replicate →