minimax/speech-2.6-turbo ππ’ββ β πΌοΈ
About
Lowβlatency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech to Replicate with 300+ voices and real-time friendly pricing
Example Output
Output
Performance Metrics
3.30s
Prediction Time
3.32s
Total Time
All Input Parameters
{
"text": "Minimax just released Speech 2.6, It's really good, It builds on top of what existed before, The HD version is perfectly optimized for high-fidelity applications like voiceovers and audiobooks, And the Turbo variant is better for real-time applications with low latency.",
"pitch": 0,
"speed": 1,
"volume": 1,
"bitrate": 128000,
"channel": "mono",
"emotion": "auto",
"voice_id": "Wise_Woman",
"sample_rate": 32000,
"audio_format": "mp3",
"output_format": "hex",
"language_boost": "None",
"subtitle_enable": false,
"english_normalization": false
}
Input Parameters
- text (required)
- Text to narrate (max 10,000 characters). Use markers like <#0.5#> to insert pauses in seconds.
- pitch
- Semitone offset applied to the voice (β12 to +12).
- speed
- Speech speed multiplier (0.5β2.0). Lower is slower, higher is faster.
- volume
- Relative loudness. 1.0 is default MiniMax gain. Range 0β10.
- bitrate
- MP3 bitrate in bits per second. Only used when audio_format is mp3.
- channel
- mono for 1 channel (default), stereo for 2 channels.
- emotion
- Desired delivery style. Use auto to let MiniMax choose, or pick a specific emotion.
- voice_id
- Voice to synthesize. Pick any MiniMax system voice or a voice_id returned by https://replicate.com/minimax/voice-cloning.
- sample_rate
- Audio sample rate in Hz.
- audio_format
- File format for the generated audio. Choose mp3 for general use, wav/flac for lossless, or pcm for raw bytes.
- language_boost
- Optional language hint. Choose Automatic to let MiniMax detect the language, or pick a specific locale.
- subtitle_enable
- Return MiniMax subtitle metadata with sentence timestamps (non-streaming only).
- english_normalization
- Improve number/date reading for English text (adds a small amount of latency).
Output Schema
Output
Example Execution Logs
Generating speech with model speech-2.6-turbo Generated speech in 3.28sec Each character is 1 token Tokens: 270
Version Details
- Version ID
24c0b2d2819faa5ce6eff09fc136c625e6e8c90e6f8a1cca75845f26fe9e1c4e- Version Created
- November 7, 2025