qwen/qwen3-tts ❓📝🖼️ → 🖼️
About
A unified Text-to-Speech demo featuring three powerful modes: Voice, Clone and Design
Example Output
Output
Performance Metrics
3.08s
Prediction Time
3.09s
Total Time
All Input Parameters
{
"mode": "custom_voice",
"text": "Hello, I'm Aiden and it's very nice to meet you",
"speaker": "Aiden",
"language": "auto"
}
Input Parameters
- mode
- TTS mode: 'custom_voice' uses preset speakers, 'voice_clone' clones from reference audio, 'voice_design' creates voice from description
- text (required)
- Text to synthesize into speech
- speaker
- Preset speaker voice (only for 'custom_voice' mode)
- language
- Language of the text (use 'auto' for automatic detection)
- reference_text
- Transcript of the reference audio (recommended for 'voice_clone' mode)
- reference_audio
- Reference audio file for voice cloning (only for 'voice_clone' mode)
- style_instruction
- Optional style/emotion instruction (e.g., 'speak slowly and calmly', 'excited tone')
- voice_description
- Natural language description of desired voice (only for 'voice_design' mode). Example: 'A warm, friendly female voice with a slight British accent'
Output Schema
Output
Example Execution Logs
Setting `pad_token_id` to `eos_token_id`:2150 for open-end generation.
Version Details
- Version ID
501be1210291d541fb5656bbe4808e6290470741029a34004f19e20f6d2365e8- Version Created
- January 23, 2026