inworld/tts-1.5-mini 📝❓🔢 → 🖼️

⭐ Official ▶️ 13.7K runs 📅 Mar 2026 ⚙️ Cog 0.18.0

About

Ultra-fast, cost-efficient text-to-speech with ~120ms latency and 15-language support

Example Output

Output

Performance Metrics

1.48s Prediction Time

1.50s Total Time

All Input Parameters

{
  "text": "Welcome to the future of voice AI. Inworld's text-to-speech technology brings natural, expressive speech to any application.",
  "voice_id": "Ashley"
}

Input Parameters

text (required) Type: string: The text to convert to speech. Maximum 2,000 characters. Supports SSML break tags for pauses (e.g. `<break time="1s" />`), emotion markups (e.g. `[happy]`, `[sad]`), and non-verbal vocalizations (e.g. `[laugh]`, `[sigh]`).
voice_id Type: stringDefault: Ashley: The voice to use. Use a preset voice name (e.g. 'Ashley', 'Dennis', 'Alex') or a custom cloned voice ID.
sample_rate Default: 48000: Audio sample rate in Hz.
temperature Type: numberDefault: 0Range: 0 - 2: Controls randomness when generating audio. Higher values produce more expressive results, lower values are more deterministic. Set to 0 to use the model default (1.1).
audio_format Default: mp3: Output audio format.
speaking_rate Type: numberDefault: 0Range: 0 - 1.5: Speaking speed multiplier. Set to 0 for normal speed (1.0).
text_normalization Default: auto: Controls whether numbers, dates, and abbreviations are expanded before synthesis. 'auto' lets the model decide, 'on' always normalizes, 'off' reads text as-is.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Synthesized 0.11MB audio in 1.21sec
Processed characters: 124

Version Details

Version ID: 787e87b8a178054348a86663750877f97187237604ad294931b6447c3d3f680c
Version Created: April 16, 2026

Run on Replicate →