inworld/tts-1.5-max 📝❓🔢 → 🖼️

⭐ Official ▶️ 46.9K runs 📅 Mar 2026 ⚙️ Cog 0.18.0
multilingual text-to-speech voice-cloning

About

Highest-quality text-to-speech with <200ms latency, emotion control, and 15-language support

Example Output

Output

Example output

Performance Metrics

2.12s Prediction Time
2.14s Total Time
All Input Parameters
{
  "text": "Welcome to the future of voice AI. Inworld's text-to-speech technology brings natural, expressive speech to any application.",
  "voice_id": "Ashley"
}
Input Parameters
text (required) Type: string
The text to convert to speech. Maximum 2,000 characters. Supports SSML break tags for pauses (e.g. `<break time="1s" />`), emotion markups (e.g. `[happy]`, `[sad]`), and non-verbal vocalizations (e.g. `[laugh]`, `[sigh]`).
voice_id Type: stringDefault: Ashley
The voice to use. Use a preset voice name (e.g. 'Ashley', 'Dennis', 'Alex') or a custom cloned voice ID.
sample_rate Default: 48000
Audio sample rate in Hz.
temperature Type: numberDefault: 0Range: 0 - 2
Controls randomness when generating audio. Higher values produce more expressive results, lower values are more deterministic. Set to 0 to use the model default (1.1).
audio_format Default: mp3
Output audio format.
speaking_rate Type: numberDefault: 0Range: 0 - 1.5
Speaking speed multiplier. Set to 0 for normal speed (1.0).
text_normalization Default: auto
Controls whether numbers, dates, and abbreviations are expanded before synthesis. 'auto' lets the model decide, 'on' always normalizes, 'off' reads text as-is.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Synthesized 0.12MB audio in 1.88sec
Processed characters: 124
Version Details
Version ID
4a2e51066a48d694207736b7598590ac44a8c780d3ee50147951c5ddf3e52e1d
Version Created
April 16, 2026
Run on Replicate →