lucataco/csm-1b 📝❓🔢 → 🖼️

▶️ 954 runs 📅 Mar 2025 ⚙️ Cog 0.14.2 🔗 GitHub ⚖️ License
text-to-speech

About

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs

Example Output

Output

Example output

Performance Metrics

5.99s Prediction Time
5.99s Total Time
All Input Parameters
{
  "text": "This is CSM by Sesame, generate FVQ audio codes from text",
  "speaker": 0,
  "max_audio_length_ms": 10000
}
Input Parameters
text Type: stringDefault: Hello from Sesame.
Text to convert to speech
speaker Default: 0
Speaker ID (0 or 1)
max_audio_length_ms Type: integerDefault: 10000Range: 1000 - 30000
Maximum audio length in milliseconds
Output Schema

Output

Type: stringFormat: uri

Version Details
Version ID
3e59b10a9894c54ae5f2fc0347e3a2f5c82f0574407e53a7d9f76ec7c502ad03
Version Created
March 21, 2025
Run on Replicate →