lucataco/csm-1b 📝❓🔢 → 🖼️

▶️ 3.1K runs 📅 Mar 2025 ⚙️ Cog 0.14.2 🔗 GitHub ⚖️ License

Performance

6.0sTypical run time

3.1KTotal runs

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs

5.99s Prediction Time

5.99s Total Time

All Input Parameters

{
  "text": "This is CSM by Sesame, generate FVQ audio codes from text",
  "speaker": 0,
  "max_audio_length_ms": 10000
}

Input Parameters

text Type: stringDefault: Hello from Sesame.: Text to convert to speech
speaker Default: 0: Speaker ID (0 or 1)
max_audio_length_ms Type: integerDefault: 10000Range: 1000 - 30000: Maximum audio length in milliseconds

Output Schema

Output

Type: string • Format: uri

Version Details

Version ID: 3e59b10a9894c54ae5f2fc0347e3a2f5c82f0574407e53a7d9f76ec7c502ad03
Version Created: March 21, 2025