cjwbw/parler-tts 📝 → 🖼️

▶️ 3.0K runs 📅 Apr 2024 ⚙️ Cog 0.9.4 🔗 GitHub ⚖️ License

controllable-tts controllable-voice text-to-speech

Performance

16.4sTypical run time

~145sCold start (first call)

3.0KTotal runs

About

lightweight text-to-speech (TTS) model, trained on 10.5K hours of audio data

Example Output

Prompt:

"Remember - this is only the first iteration of the model! To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data by a factor of five times."

Output

Performance Metrics

16.38s Prediction Time

145.06s Total Time

All Input Parameters

{
  "prompt": "Remember - this is only the first iteration of the model! To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data by a factor of five times.",
  "description": "A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone."
}

Input Parameters

prompt Type: stringDefault: Hey, how are you doing today?: Text for audio generation
description Type: stringDefault: A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast.: Provide description of the output audio

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using the model-agnostic default `max_length` (=2580) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.

Version Details

Version ID: bf38249a8cc143b97b5108570d1c81b8321881dd91fe7837877e7dfa3a0fad27
Version Created: April 15, 2024

Run on Replicate →