lucataco/higgs-audio-v2 📝🔢 → 🖼️

▶️ 2.4K runs 📅 Jul 2025 ⚙️ Cog 0.16.0 🔗 GitHub 📄 Paper ⚖️ License

expressive-tts multilingual multilingual-tts text-to-speech

About

Higgs Audio v2, a powerful text-to-speech audio foundation model that excels in expressive audio generation

Example Output

Output

Performance Metrics

4.66s Prediction Time

185.02s Total Time

All Input Parameters

{
  "text": "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years",
  "top_k": 50,
  "top_p": 0.95,
  "temperature": 0.3,
  "max_new_tokens": 1024,
  "system_message": "",
  "scene_description": "Audio is recorded from a quiet room."
}

Input Parameters

text Type: stringDefault: The sun rises in the east and sets in the west: Text to convert to speech
top_k Type: integerDefault: 50Range: 1 - 100: Top-k sampling parameter. Limits vocabulary to top k tokens.
top_p Type: numberDefault: 0.95Range: 0.1 - 1: Nucleus sampling parameter. Controls diversity of generated audio.
temperature Type: numberDefault: 0.3Range: 0.1 - 1: Controls randomness in generation. Lower values are more deterministic.
max_new_tokens Type: integerDefault: 1024Range: 256 - 2048: Maximum number of audio tokens to generate
system_message Type: stringDefault:: Custom system message (optional)
scene_description Type: stringDefault: Audio is recorded from a quiet room.: Scene description for audio context

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48

Version Details

Version ID: f5945a3453e6258b3a82b10bfefdfd171fa756c8e42a7c5e07326f3a98ef8e09
Version Created: July 28, 2025

Run on Replicate →