lucataco/higgs-audio-v2 📝🔢 → 🖼️
About
Higgs Audio v2, a powerful text-to-speech audio foundation model that excels in expressive audio generation

Example Output
Output
Performance Metrics
4.66s
Prediction Time
185.02s
Total Time
All Input Parameters
{ "text": "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years", "top_k": 50, "top_p": 0.95, "temperature": 0.3, "max_new_tokens": 1024, "system_message": "", "scene_description": "Audio is recorded from a quiet room." }
Input Parameters
- text
- Text to convert to speech
- top_k
- Top-k sampling parameter. Limits vocabulary to top k tokens.
- top_p
- Nucleus sampling parameter. Controls diversity of generated audio.
- temperature
- Controls randomness in generation. Lower values are more deterministic.
- max_new_tokens
- Maximum number of audio tokens to generate
- system_message
- Custom system message (optional)
- scene_description
- Scene description for audio context
Output Schema
Output
Example Execution Logs
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
Version Details
- Version ID
f5945a3453e6258b3a82b10bfefdfd171fa756c8e42a7c5e07326f3a98ef8e09
- Version Created
- July 28, 2025