lucataco/csm-1b 📝❓🔢 → 🖼️
About
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs
Example Output
Output
Performance Metrics
5.99s
Prediction Time
5.99s
Total Time
All Input Parameters
{
"text": "This is CSM by Sesame, generate FVQ audio codes from text",
"speaker": 0,
"max_audio_length_ms": 10000
}
Input Parameters
- text
- Text to convert to speech
- speaker
- Speaker ID (0 or 1)
- max_audio_length_ms
- Maximum audio length in milliseconds
Output Schema
Output
Version Details
- Version ID
3e59b10a9894c54ae5f2fc0347e3a2f5c82f0574407e53a7d9f76ec7c502ad03- Version Created
- March 21, 2025