cjwbw/parler-tts 📝 → 🖼️

▶️ 2.6K runs 📅 Apr 2024 ⚙️ Cog 0.9.4 🔗 GitHub ⚖️ License
controllable-tts text-to-speech

About

lightweight text-to-speech (TTS) model, trained on 10.5K hours of audio data

Example Output

Prompt:

"Remember - this is only the first iteration of the model! To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data by a factor of five times."

Output

Example output

Performance Metrics

16.38s Prediction Time
145.06s Total Time
All Input Parameters
{
  "prompt": "Remember - this is only the first iteration of the model! To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data by a factor of five times.",
  "description": "A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone."
}
Input Parameters
prompt Type: stringDefault: Hey, how are you doing today?
Text for audio generation
description Type: stringDefault: A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast.
Provide description of the output audio
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using the model-agnostic default `max_length` (=2580) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
Version Details
Version ID
bf38249a8cc143b97b5108570d1c81b8321881dd91fe7837877e7dfa3a0fad27
Version Created
April 15, 2024
Run on Replicate →