adirik/styletts2 🔢📝🖼️ → 🖼️

▶️ 131.8K runs 📅 Nov 2023 ⚙️ Cog 0.9.0-beta10 🔗 GitHub 📄 Paper ⚖️ License
text-to-speech voice-cloning

About

Generates speech from text

Example Output

Output

Example output

Performance Metrics

5.43s Prediction Time
7.22s Total Time
All Input Parameters
{
  "beta": 0.7,
  "seed": 0,
  "text": "StyleTTS 2 is a text-to-speech model that leverages style diffusion and adversarial training with large speech language models to achieve human-level text-to-speech synthesis.",
  "alpha": 0.3,
  "diffusion_steps": 10,
  "embedding_scale": 1.5
}
Input Parameters
beta Type: numberDefault: 0.7Range: 0 - 1
Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text.
seed Type: integerDefault: 0
Seed for reproducibility
text (required) Type: string
Text to convert to speech
alpha Type: numberDefault: 0.3Range: 0 - 1
Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text.
weights Type: string
Replicate weights url for inference with model that is fine-tuned on new speakers. If provided, a reference speech must also be provided. If not provided, the default model will be used.
reference Type: string
Reference speech to copy style from
diffusion_steps Type: integerDefault: 10Range: 0 - 50
Number of diffusion steps
embedding_scale Type: numberDefault: 1Range: 0 - 5
Embedding scale, use higher values for pronounced emotion
Output Schema

Output

Type: stringFormat: uri

Version Details
Version ID
989cb5ea6d2401314eb30685740cb9f6fd1c9001b8940659b406f952837ab5ac
Version Created
January 31, 2024
Run on Replicate →