minimax/speech-2.8-hd 📝🔢❓✓ → 🖼️

⭐ Official ▶️ 62.4K runs 📅 Feb 2026 ⚙️ Cog 0.16.11
text-to-speech voice-cloning

About

Minimax Speech 2.8 HD focuses on high-fidelity audio generation with features like studio-grade quality, flexible emotion control, multilingual support, and voice cloning capabilities

Example Output

Output

Example output

Performance Metrics

2.09s Prediction Time
5.76s Total Time
All Input Parameters
{
  "text": "Hello, world! This is a simple test of the MiniMax Speech 2.8 model thats now on Replicate",
  "pitch": 0,
  "speed": 1,
  "volume": 1,
  "bitrate": 128000,
  "channel": "mono",
  "emotion": "auto",
  "voice_id": "Wise_Woman",
  "sample_rate": 32000,
  "audio_format": "mp3",
  "language_boost": "None",
  "subtitle_enable": false,
  "english_normalization": false
}
Input Parameters
text (required) Type: string
Text to narrate (max 10,000 characters). Use markers like <#0.5#> to insert pauses in seconds.
pitch Type: integerDefault: 0Range: -12 - 12
Semitone offset applied to the voice (−12 to +12).
speed Type: numberDefault: 1Range: 0.5 - 2
Speech speed multiplier (0.5–2.0). Lower is slower, higher is faster.
volume Type: numberDefault: 1Range: 0 - 10
Relative loudness. 1.0 is default MiniMax gain. Range 0–10.
bitrate Default: 128000
MP3 bitrate in bits per second. Only used when audio_format is mp3.
channel Default: mono
mono for 1 channel (default), stereo for 2 channels.
emotion Default: auto
Desired delivery style. Use auto to let MiniMax choose, or pick a specific emotion.
voice_id Type: stringDefault: Wise_Woman
Voice to synthesize. Pick any MiniMax system voice or a voice_id returned by https://replicate.com/minimax/voice-cloning.
sample_rate Default: 32000
Audio sample rate in Hz.
audio_format Default: mp3
File format for the generated audio. Choose mp3 for general use, wav/flac for lossless, or pcm for raw bytes.
language_boost Default: None
Optional language hint. Choose Automatic to let MiniMax detect the language, or pick a specific locale.
subtitle_enable Type: booleanDefault: false
Return MiniMax subtitle metadata with sentence timestamps (non-streaming only).
english_normalization Type: booleanDefault: false
Improve number/date reading for English text (adds a small amount of latency).
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Generating speech with model speech-2.8-hd
Generated speech in 2.08sec
Each character is 1 token
Tokens: 90
Version Details
Version ID
bb4b16034cd66abe0d3147d50a63890e0144328136ca082f3f141f42ed0d4be9
Version Created
February 5, 2026
Run on Replicate →