lucataco/step-audio-tts-3b 📝❓ → 🖼️

▶️ 1.1K runs 📅 Feb 2025 ⚙️ Cog 0.13.7 🔗 GitHub ⚖️ License

multilingual rap singing-voice-generation text-to-speech

About

Step-Audio-TTS-3B represents the industry's first Text-to-Speech (TTS) model trained on a large-scale synthetic dataset utilizing the LLM-Chat paradigm

Example Output

Output

Performance Metrics

53.35s Prediction Time

66.04s Total Time

All Input Parameters

{
  "text": "（RAP) I set out on the journey of freedom, chasing that distant dream, breaking free from the shackles of bondage, letting my soul drift with the wind, every step is full of power, every moment is extremely shining, the belief in freedom is burning, illuminating the direction of my progress!",
  "speaker_name": "闫雨婷"
}

Input Parameters

text Type: stringDefault: （RAP I set out on the journey of freedom, chasing that distant dream, breaking free from the shackles of bondage, letting my soul drift with the wind, every step is full of power, every moment is extremely shining, the belief in freedom is burning, illuminating the direction of my progress!: Text to synthesize into speech
speaker_name Default: 闫雨婷: Speaker name

Output Schema

Output

Type: string • Format: uri

Version Details

Version ID: 8c30688893eb0a713273758033ea410a4022060142ac3ee5b93937ecbc27209f
Version Created: February 17, 2025

Run on Replicate →