minimax/speech-02-hd 📝🔢❓✓ → 🖼️
About
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.

Example Output
Output
Performance Metrics
2.38s
Prediction Time
2.39s
Total Time
All Input Parameters
{ "text": "Speech-02-series is a Text-to-Audio and voice cloning technology that offers voice synthesis, emotional expression, and multilingual capabilities.\n\nThe HD version is optimized for high-fidelity applications like voiceovers and audiobooks. While the turbo one is designed for real-time applications with low latency.\n\nWhen using this model on Replicate, each character represents 1 token.", "pitch": 0, "speed": 1, "volume": 1, "bitrate": 128000, "channel": "mono", "emotion": "happy", "voice_id": "Friendly_Person", "sample_rate": 32000, "language_boost": "English", "english_normalization": true }
Input Parameters
- text (required)
- Text to convert to speech. Every character is 1 token. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).
- pitch
- Speech pitch
- speed
- Speech speed
- volume
- Speech volume
- bitrate
- Bitrate for the generated speech
- channel
- Number of audio channels
- emotion
- Speech emotion
- voice_id
- Desired voice ID. Use a voice ID you have trained (https://replicate.com/minimax/voice-cloning), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
- sample_rate
- Sample rate for the generated speech
- language_boost
- Enhance recognition of specific languages and dialects
- english_normalization
- Enable English text normalization for better number reading (slightly increases latency)
Output Schema
Output
Example Execution Logs
Generating speech with model speech-02-hd Generated speech in 2.38sec Each character is 1 token Tokens: 387
Version Details
- Version ID
29657f664032844b8f800486164cf26acb2507288e348133e78ae871a43211d0
- Version Created
- May 6, 2025