prunaai/dia-1.6b 🔢📝 → 🖼️
About

Example Output
Output
Performance Metrics
20.49s
Prediction Time
20.50s
Total Time
All Input Parameters
{ "seed": -1, "text": "[S1] It's on Replicate!!! Oh fire! Oh my goodness! What's the procedure? What do we do people? The Dia text-to-speech model — now Pruna-optimized — just dropped on Replicate!!\n\n[S2] Oh my god! Okay… it's happening. Everybody stay calm!\n\n[S1] What's the procedure…\n\n[S2] Everybody stay fricking calm!!!... Everybody fudging calm down!!!!!\n\n[S1] Yes! Yes! Let's try it out at prunaai/dia-1.6b (laughs) — powered up and made leaner with Pruna!\n\n[S2] (whispers) try it now… (whispers) turbocharged by Pruna…", "top_p": 0.95, "cfg_scale": 3, "temperature": 1.3, "speed_factor": 0.94, "max_new_tokens": 3072, "cfg_filter_top_k": 35 }
Input Parameters
- seed
- Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.
- text (required)
- Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).
- top_p
- Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.
- cfg_scale
- Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.
- temperature
- Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values (0.1-0.9) make output more consistent and predictable.
- speed_factor
- Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.
- max_new_tokens
- Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).
- cfg_filter_top_k
- Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.
Output Schema
Output
Example Execution Logs
Random seed set to 16609 Generating audio tokens... Token generation finished in 20.46 seconds. Generated audio shape: (814080,) Adjusting speed by factor 0.94... Resampled audio from 814080 to 866042 samples. Saving audio to /tmp/tmpnfzzw8pj/output.wav... Audio saved in 0.02 seconds. Total prediction time: 20.48 seconds.
Version Details
- Version ID
5e364f84db4cd1916990138229e14179036a1bfcf20a39c4b0f3214e33a5c48a
- Version Created
- April 24, 2025