playht/play-dialog 🔢📝❓ → 🖼️

⭐ Official ▶️ 27.1K runs 📅 Jan 2025 ⚙️ Cog 0.13.6

conversational-tts emotion-tts multi-speaker-tts multi-voice-dialogue multilingual text-to-speech

Performance

26.5sTypical run time

27.1KTotal runs

About

End-to-end AI speech model designed for natural-sounding conversational speech synthesis, with support for context-aware prosody, intonation, and emotional expression.

Example Output

Output

Performance Metrics

26.46s Prediction Time

26.47s Total Time

All Input Parameters

{
  "text": "Close your eyes gently. Take a deep breath in through your nose, allowing your lungs to fill completely. Hold it for a moment, then exhale slowly and deeply through your mouth. Let any tension you're holding melt away with each exhale. Visualize yourself standing at the edge of a beautiful, tranquil forest. The first rays of morning light stream through the trees, illuminating the delicate dewdrops on leaves and petals. Birds are beginning to sing their morning songs, and the world feels fresh and alive.",
  "voice": "Nia (Young female US conversational voice)",
  "prompt": "",
  "prompt2": "",
  "voice_2": "None",
  "language": "english",
  "turnPrefix": "Voice 1:",
  "temperature": 1.02,
  "turnPrefix2": "Voice 2:",
  "voice_conditioning_seconds": 20,
  "voice_conditioning_seconds_2": 20
}

Input Parameters

seed Type: integer: Random seed. Set for reproducible generation
text (required) Type: string: Text for speech generation
speed Type: numberDefault: 1Range: 0.1 - 5: Control how fast the generated audio should be.
voice Default: Angelo (Young male US conversational voice): Voice to use for generation
prompt Type: stringDefault:: A prompt to guide the style of the output generated by the first voice.
prompt2 Type: stringDefault:: A prompt to guide the style of the output generated by the second voice.
voice_2 Default: None: Optional second voice to use for generation
language Default: english: The language of the text to be spoken.
turnPrefix Type: stringDefault: Voice 1:: The prefix to indicate the start of a turn in a multi-turn dialogue for the first voice.
temperature Type: numberDefault: 1Range: 0 - 2: The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.
turnPrefix2 Type: stringDefault: Voice 2:: The prefix to indicate the start of a turn in a multi-turn dialogue for the second voice.
voice_conditioning_seconds Type: integerDefault: 20Range: 1 - 60: The number of seconds of conditioning to use from the selected voice. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.
voice_conditioning_seconds_2 Type: integerDefault: 20Range: 1 - 60: The number of seconds of conditioning to use from the second selected voice.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 1958393623
Running prediction... 
Generating audio...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Generated audio in 26.1sec
Length of generated audio: 39.24 seconds
Downloading 785805 bytes
Downloaded 0.75MB in 0.28sec

Version Details

Version ID: 0d5710136b2204bb0a8b927a9e50904af22c2d238b813b7e0cdf8f17f12670f8
Version Created: January 13, 2025

Run on Replicate →