lucataco/higgs-audio-v2 📝🔢 → 🖼️

▶️ 1.4K runs 📅 Jul 2025 ⚙️ Cog 0.16.0 🔗 GitHub 📄 Paper ⚖️ License
expressive-tts multilingual multilingual-tts text-to-speech

About

Higgs Audio v2, a powerful text-to-speech audio foundation model that excels in expressive audio generation

Example Output

Output

Example output

Performance Metrics

4.66s Prediction Time
185.02s Total Time
All Input Parameters
{
  "text": "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years",
  "top_k": 50,
  "top_p": 0.95,
  "temperature": 0.3,
  "max_new_tokens": 1024,
  "system_message": "",
  "scene_description": "Audio is recorded from a quiet room."
}
Input Parameters
text Type: stringDefault: The sun rises in the east and sets in the west
Text to convert to speech
top_k Type: integerDefault: 50Range: 1 - 100
Top-k sampling parameter. Limits vocabulary to top k tokens.
top_p Type: numberDefault: 0.95Range: 0.1 - 1
Nucleus sampling parameter. Controls diversity of generated audio.
temperature Type: numberDefault: 0.3Range: 0.1 - 1
Controls randomness in generation. Lower values are more deterministic.
max_new_tokens Type: integerDefault: 1024Range: 256 - 2048
Maximum number of audio tokens to generate
system_message Type: stringDefault:
Custom system message (optional)
scene_description Type: stringDefault: Audio is recorded from a quiet room.
Scene description for audio context
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
Version Details
Version ID
f5945a3453e6258b3a82b10bfefdfd171fa756c8e42a7c5e07326f3a98ef8e09
Version Created
July 28, 2025
Run on Replicate →