lucataco/csm-1b 📝❓🔢 → 🖼️
About
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs

Example Output
Output
Performance Metrics
5.99s
Prediction Time
5.99s
Total Time
All Input Parameters
{ "text": "This is CSM by Sesame, generate FVQ audio codes from text", "speaker": 0, "max_audio_length_ms": 10000 }
Input Parameters
- text
- Text to convert to speech
- speaker
- Speaker ID (0 or 1)
- max_audio_length_ms
- Maximum audio length in milliseconds
Output Schema
Output
Version Details
- Version ID
3e59b10a9894c54ae5f2fc0347e3a2f5c82f0574407e53a7d9f76ec7c502ad03
- Version Created
- March 21, 2025