adirik/styletts2 🔢📝🖼️ → 🖼️

▶️ 131.8K runs 📅 Nov 2023 ⚙️ Cog 0.9.0-beta10 🔗 GitHub 📄 Paper ⚖️ License

text-to-speech voice-cloning

About

Generates speech from text

Example Output

Output

Performance Metrics

5.43s Prediction Time

7.22s Total Time

All Input Parameters

{
  "beta": 0.7,
  "seed": 0,
  "text": "StyleTTS 2 is a text-to-speech model that leverages style diffusion and adversarial training with large speech language models to achieve human-level text-to-speech synthesis.",
  "alpha": 0.3,
  "diffusion_steps": 10,
  "embedding_scale": 1.5
}

Input Parameters

beta Type: numberDefault: 0.7Range: 0 - 1: Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text.
seed Type: integerDefault: 0: Seed for reproducibility
text (required) Type: string: Text to convert to speech
alpha Type: numberDefault: 0.3Range: 0 - 1: Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text.
weights Type: string: Replicate weights url for inference with model that is fine-tuned on new speakers. If provided, a reference speech must also be provided. If not provided, the default model will be used.
reference Type: string: Reference speech to copy style from
diffusion_steps Type: integerDefault: 10Range: 0 - 50: Number of diffusion steps
embedding_scale Type: numberDefault: 1Range: 0 - 5: Embedding scale, use higher values for pronounced emotion

Output Schema

Output

Type: string • Format: uri

Version Details

Version ID: 989cb5ea6d2401314eb30685740cb9f6fd1c9001b8940659b406f952837ab5ac
Version Created: January 31, 2024

Run on Replicate →