inworld/tts-1.5-mini 📝❓🔢 → 🖼️
About
Ultra-fast, cost-efficient text-to-speech with ~120ms latency and 15-language support
Example Output
Output
Performance Metrics
1.48s
Prediction Time
1.50s
Total Time
All Input Parameters
{
"text": "Welcome to the future of voice AI. Inworld's text-to-speech technology brings natural, expressive speech to any application.",
"voice_id": "Ashley"
}
Input Parameters
- text (required)
- The text to convert to speech. Maximum 2,000 characters. Supports SSML break tags for pauses (e.g. `<break time="1s" />`), emotion markups (e.g. `[happy]`, `[sad]`), and non-verbal vocalizations (e.g. `[laugh]`, `[sigh]`).
- voice_id
- The voice to use. Use a preset voice name (e.g. 'Ashley', 'Dennis', 'Alex') or a custom cloned voice ID.
- sample_rate
- Audio sample rate in Hz.
- temperature
- Controls randomness when generating audio. Higher values produce more expressive results, lower values are more deterministic. Set to 0 to use the model default (1.1).
- audio_format
- Output audio format.
- speaking_rate
- Speaking speed multiplier. Set to 0 for normal speed (1.0).
- text_normalization
- Controls whether numbers, dates, and abbreviations are expanded before synthesis. 'auto' lets the model decide, 'on' always normalizes, 'off' reads text as-is.
Output Schema
Output
Example Execution Logs
Synthesized 0.11MB audio in 1.21sec Processed characters: 124
Version Details
- Version ID
787e87b8a178054348a86663750877f97187237604ad294931b6447c3d3f680c- Version Created
- April 16, 2026