inworld/tts-1.5-max 📝❓🔢 → 🖼️

⭐ Official ▶️ 46.9K runs 📅 Mar 2026 ⚙️ Cog 0.18.0

About

Highest-quality text-to-speech with <200ms latency, emotion control, and 15-language support

Example Output

Output

Performance Metrics

2.12s Prediction Time

2.14s Total Time

All Input Parameters

{
  "text": "Welcome to the future of voice AI. Inworld's text-to-speech technology brings natural, expressive speech to any application.",
  "voice_id": "Ashley"
}

Input Parameters

text (required) Type: string: The text to convert to speech. Maximum 2,000 characters. Supports SSML break tags for pauses (e.g. `<break time="1s" />`), emotion markups (e.g. `[happy]`, `[sad]`), and non-verbal vocalizations (e.g. `[laugh]`, `[sigh]`).
voice_id Type: stringDefault: Ashley: The voice to use. Use a preset voice name (e.g. 'Ashley', 'Dennis', 'Alex') or a custom cloned voice ID.
sample_rate Default: 48000: Audio sample rate in Hz.
temperature Type: numberDefault: 0Range: 0 - 2: Controls randomness when generating audio. Higher values produce more expressive results, lower values are more deterministic. Set to 0 to use the model default (1.1).
audio_format Default: mp3: Output audio format.
speaking_rate Type: numberDefault: 0Range: 0 - 1.5: Speaking speed multiplier. Set to 0 for normal speed (1.0).
text_normalization Default: auto: Controls whether numbers, dates, and abbreviations are expanded before synthesis. 'auto' lets the model decide, 'on' always normalizes, 'off' reads text as-is.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Synthesized 0.12MB audio in 1.88sec
Processed characters: 124

Version Details

Version ID: 4a2e51066a48d694207736b7598590ac44a8c780d3ee50147951c5ddf3e52e1d
Version Created: April 16, 2026

Run on Replicate →