adirik/hierspeechpp 🔢📝🖼️❓✓ → 🖼️
About
Zero-shot speech synthesizer for text-to-speech and voice conversion
Example Output
Output
Performance Metrics
3.19s
Prediction Time
95.60s
Total Time
All Input Parameters
{
"input_text": "And lay me down in my cold bed and leave my shining lot.",
"target_voice": "https://replicate.delivery/pbxt/K30ke0FQUcGCa4gQdyhEPaEeGvEwDZmEK3SMtXaoujJSMlSE/reference_1.wav",
"denoise_ratio": 0,
"output_sample_rate": 16000,
"scale_output_volume": false,
"text_to_vector_temperature": 0.33,
"voice_conversion_temperature": 0.33
}
Input Parameters
- seed
- Random seed to use for reproducibility.
- input_text
- Text input to the model. If provided, it will be used for the speech content of the output.
- input_sound
- Sound input to the model in .wav format. If provided, it will be used for the speech content of the output.
- target_voice (required)
- A voice clip in .wav format containing the speaker to synthesize.
- denoise_ratio
- Noise control. 0 means no noise reduction, 1 means maximum noise reduction. If noise reduction is desired, it is recommended to set this value to 0.6~0.8
- output_sample_rate
- Sample rate of the output audio file.
- scale_output_volume
- Scale normalization. If set to true, the output audio will be scaled according to the input sound if provided.
- text_to_vector_temperature
- Temperature for text-to-vector model. Larger value corresponds to slightly more random output.
- voice_conversion_temperature
- Temperature for the voice conversion model. Larger value corresponds to slightly more random output.
Output Schema
Output
Version Details
- Version ID
ff5bcc71dc2c44662291fc348b9ca2eb40107c9f4b377b169fc0dea950c388c8- Version Created
- December 14, 2023