inworld/tts-1.5-mini 📝❓🔢 → 🖼️

⭐ Official ▶️ 13.7K runs 📅 Mar 2026 ⚙️ Cog 0.18.0
multilingual text-to-speech voice-cloning

About

Ultra-fast, cost-efficient text-to-speech with ~120ms latency and 15-language support

Example Output

Output

Example output

Performance Metrics

1.48s Prediction Time
1.50s Total Time
All Input Parameters
{
  "text": "Welcome to the future of voice AI. Inworld's text-to-speech technology brings natural, expressive speech to any application.",
  "voice_id": "Ashley"
}
Input Parameters
text (required) Type: string
The text to convert to speech. Maximum 2,000 characters. Supports SSML break tags for pauses (e.g. `<break time="1s" />`), emotion markups (e.g. `[happy]`, `[sad]`), and non-verbal vocalizations (e.g. `[laugh]`, `[sigh]`).
voice_id Type: stringDefault: Ashley
The voice to use. Use a preset voice name (e.g. 'Ashley', 'Dennis', 'Alex') or a custom cloned voice ID.
sample_rate Default: 48000
Audio sample rate in Hz.
temperature Type: numberDefault: 0Range: 0 - 2
Controls randomness when generating audio. Higher values produce more expressive results, lower values are more deterministic. Set to 0 to use the model default (1.1).
audio_format Default: mp3
Output audio format.
speaking_rate Type: numberDefault: 0Range: 0 - 1.5
Speaking speed multiplier. Set to 0 for normal speed (1.0).
text_normalization Default: auto
Controls whether numbers, dates, and abbreviations are expanded before synthesis. 'auto' lets the model decide, 'on' always normalizes, 'off' reads text as-is.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Synthesized 0.11MB audio in 1.21sec
Processed characters: 124
Version Details
Version ID
787e87b8a178054348a86663750877f97187237604ad294931b6447c3d3f680c
Version Created
April 16, 2026
Run on Replicate →