playht/play-dialog 🔢📝❓ → 🖼️

⭐ Official ▶️ 26.8K runs 📅 Jan 2025 ⚙️ Cog 0.13.6
conversational-tts multi-speaker-tts text-to-speech

About

End-to-end AI speech model designed for natural-sounding conversational speech synthesis, with support for context-aware prosody, intonation, and emotional expression.

Example Output

Output

Example output

Performance Metrics

26.46s Prediction Time
26.47s Total Time
All Input Parameters
{
  "text": "Close your eyes gently. Take a deep breath in through your nose, allowing your lungs to fill completely. Hold it for a moment, then exhale slowly and deeply through your mouth. Let any tension you're holding melt away with each exhale. Visualize yourself standing at the edge of a beautiful, tranquil forest. The first rays of morning light stream through the trees, illuminating the delicate dewdrops on leaves and petals. Birds are beginning to sing their morning songs, and the world feels fresh and alive.",
  "voice": "Nia (Young female US conversational voice)",
  "prompt": "",
  "prompt2": "",
  "voice_2": "None",
  "language": "english",
  "turnPrefix": "Voice 1:",
  "temperature": 1.02,
  "turnPrefix2": "Voice 2:",
  "voice_conditioning_seconds": 20,
  "voice_conditioning_seconds_2": 20
}
Input Parameters
seed Type: integer
Random seed. Set for reproducible generation
text (required) Type: string
Text for speech generation
speed Type: numberDefault: 1Range: 0.1 - 5
Control how fast the generated audio should be.
voice Default: Angelo (Young male US conversational voice)
Voice to use for generation
prompt Type: stringDefault:
A prompt to guide the style of the output generated by the first voice.
prompt2 Type: stringDefault:
A prompt to guide the style of the output generated by the second voice.
voice_2 Default: None
Optional second voice to use for generation
language Default: english
The language of the text to be spoken.
turnPrefix Type: stringDefault: Voice 1:
The prefix to indicate the start of a turn in a multi-turn dialogue for the first voice.
temperature Type: numberDefault: 1Range: 0 - 2
The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.
turnPrefix2 Type: stringDefault: Voice 2:
The prefix to indicate the start of a turn in a multi-turn dialogue for the second voice.
voice_conditioning_seconds Type: integerDefault: 20Range: 1 - 60
The number of seconds of conditioning to use from the selected voice. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness.
voice_conditioning_seconds_2 Type: integerDefault: 20Range: 1 - 60
The number of seconds of conditioning to use from the second selected voice.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 1958393623
Running prediction... 
Generating audio...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Still processing...
Generated audio in 26.1sec
Length of generated audio: 39.24 seconds
Downloading 785805 bytes
Downloaded 0.75MB in 0.28sec
Version Details
Version ID
0d5710136b2204bb0a8b927a9e50904af22c2d238b813b7e0cdf8f17f12670f8
Version Created
January 13, 2025
Run on Replicate →