qwen/qwen3-tts ❓📝🖼️ → 🖼️

⭐ Official ▶️ 1.0M runs 📅 Jan 2026 ⚙️ Cog 0.19.3 ⚖️ License

Performance

3.1sTypical run time

1.0MTotal runs

A unified Text-to-Speech demo featuring three powerful modes: Voice, Clone and Design

3.08s Prediction Time

3.09s Total Time

All Input Parameters

{
  "mode": "custom_voice",
  "text": "Hello, I'm Aiden and it's very nice to meet you",
  "speaker": "Aiden",
  "language": "auto"
}

Input Parameters

mode Default: custom_voice: TTS mode: 'custom_voice' uses preset speakers, 'voice_clone' clones from reference audio, 'voice_design' creates voice from description
text (required) Type: string: Text to synthesize into speech
speaker Default: Serena: Preset speaker voice (only for 'custom_voice' mode)
language Default: auto: Language of the text (use 'auto' for automatic detection)
reference_text Type: stringDefault: null: Transcript of the reference audio (recommended for 'voice_clone' mode)
reference_audio Type: stringDefault: null: Reference audio file for voice cloning (only for 'voice_clone' mode)
style_instruction Type: stringDefault: null: Optional style/emotion instruction (e.g., 'speak slowly and calmly', 'excited tone')
voice_description Type: stringDefault: null: Natural language description of desired voice (only for 'voice_design' mode). Example: 'A warm, friendly female voice with a slight British accent'

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Setting `pad_token_id` to `eos_token_id`:2150 for open-end generation.

Version Details

Version ID: d490a561cf1171a8dc3d96d1e57efffea7dd34607148bb641f3d9de4e38c472e
Version Created: May 8, 2026