minimax/voice-cloning ❓🔢🖼️✓ → ❓

⭐ Official ▶️ 72.5K runs 📅 May 2025 ⚙️ Cog 0.16.8 ⚖️ License

text-to-speech voice-cloning

Performance

34.1sTypical run time

72.5KTotal runs

About

Clone voices to use with Minimax's speech-02-hd and speech-02-turbo

Example Output

Output

{"model":"speech-02-turbo","preview":"https://replicate.delivery/xezq/p80hlWW4YWptBh3YGnNEDmR8ldh9QQDCxZNrICRge2HgT9UKA/tmpuo0ipa91.mp3","voice_id":"R8_FDU1SV5S"}

Performance Metrics

34.10s Prediction Time

34.12s Total Time

All Input Parameters

{
  "model": "speech-02-turbo",
  "accuracy": 0.7,
  "voice_file": "https://replicate.delivery/czjl/21U5IFboRwrhBlKks9pmaz119Hvo1ISryE0LNUKuerpqS9UKA/output.wav",
  "need_noise_reduction": false,
  "need_volume_normalization": false
}

Input Parameters

model Default: speech-02-turbo: The text-to-speech model to train
accuracy Type: numberDefault: 0.7Range: 0 - 1: Text validation accuracy threshold (0-1)
voice_file (required) Type: string: Voice file to clone. Must be MP3, M4A, or WAV format, 10s to 5min duration, and less than 20MB.
need_noise_reduction Type: booleanDefault: false: Enable noise reduction. Use this if the voice file has background noise.
need_volume_normalization Type: booleanDefault: false: Enable volume normalization

Output Schema

Example Execution Logs

Uploaded voice file in 2.10sec
Cloned voice in 18.48sec
Generating speech with model speech-02-turbo
Generated speech in 13.18sec
Voice cloned successfully with ID: R8_FDU1SV5S

Version Details

Version ID: fff8a670880f066d3742838515a88f7f0a3ae40a4f2e06dae0f7f70ba63582d7
Version Created: November 7, 2025

Run on Replicate →