lucataco/step-audio-tts-3b 📝❓ → 🖼️

▶️ 1.1K runs 📅 Feb 2025 ⚙️ Cog 0.13.7 🔗 GitHub ⚖️ License
multilingual rap singing-voice-generation text-to-speech

About

Step-Audio-TTS-3B represents the industry's first Text-to-Speech (TTS) model trained on a large-scale synthetic dataset utilizing the LLM-Chat paradigm

Example Output

Output

Example output

Performance Metrics

53.35s Prediction Time
66.04s Total Time
All Input Parameters
{
  "text": "(RAP) I set out on the journey of freedom, chasing that distant dream, breaking free from the shackles of bondage, letting my soul drift with the wind, every step is full of power, every moment is extremely shining, the belief in freedom is burning, illuminating the direction of my progress!",
  "speaker_name": "闫雨婷"
}
Input Parameters
text Type: stringDefault: (RAP I set out on the journey of freedom, chasing that distant dream, breaking free from the shackles of bondage, letting my soul drift with the wind, every step is full of power, every moment is extremely shining, the belief in freedom is burning, illuminating the direction of my progress!
Text to synthesize into speech
speaker_name Default: 闫雨婷
Speaker name
Output Schema

Output

Type: stringFormat: uri

Version Details
Version ID
8c30688893eb0a713273758033ea410a4022060142ac3ee5b93937ecbc27209f
Version Created
February 17, 2025
Run on Replicate →