adirik/styletts2 🔢📝🖼️ → 🖼️
About
Generates speech from text

Example Output
Output
Performance Metrics
5.43s
Prediction Time
7.22s
Total Time
All Input Parameters
{ "beta": 0.7, "seed": 0, "text": "StyleTTS 2 is a text-to-speech model that leverages style diffusion and adversarial training with large speech language models to achieve human-level text-to-speech synthesis.", "alpha": 0.3, "diffusion_steps": 10, "embedding_scale": 1.5 }
Input Parameters
- beta
- Only used for long text inputs or in case of reference speaker, determines the prosody of the speaker. Use lower values to sample style based on previous or reference speech instead of text.
- seed
- Seed for reproducibility
- text (required)
- Text to convert to speech
- alpha
- Only used for long text inputs or in case of reference speaker, determines the timbre of the speaker. Use lower values to sample style based on previous or reference speech instead of text.
- weights
- Replicate weights url for inference with model that is fine-tuned on new speakers. If provided, a reference speech must also be provided. If not provided, the default model will be used.
- reference
- Reference speech to copy style from
- diffusion_steps
- Number of diffusion steps
- embedding_scale
- Embedding scale, use higher values for pronounced emotion
Output Schema
Output
Version Details
- Version ID
989cb5ea6d2401314eb30685740cb9f6fd1c9001b8940659b406f952837ab5ac
- Version Created
- January 31, 2024