lucataco/neutts-air 📝🖼️ → 🖼️
About
super-realistic, TTS speech language model with instant voice cloning

Example Output
Output
Performance Metrics
19.27s
Prediction Time
367.50s
Total Time
All Input Parameters
{ "text": "My name is Dave, and um, I'm from London.", "ref_text": "So I'm live on radio. And I say, well, my dear friend James here clearly, and the whole room just froze. Turns out I'd completely misspoken and mentioned our other friend.", "ref_audio": "https://replicate.delivery/pbxt/Nqm1eHrhRE8RIR9uAIlwDjlQY2yBswQzlkh1myYC67Ixycag/dave.wav" }
Input Parameters
- text
- The text to synthesize as speech
- ref_text
- Transcript of the reference audio (what is being said in the audio file)
- ref_audio (required)
- Reference audio file (.wav) for voice cloning (3-15 seconds, mono, 16-44kHz)
Output Schema
Output
Example Execution Logs
Encoding reference audio: /tmp/tmpby22gq5vdave.wav Generating speech for: My name is Dave, and um, I'm from London.... Speech generated successfully: /tmp/tmpfh953fjx.wav
Version Details
- Version ID
607e7b40e5fbaa97b828a6c71848c6b6ffdcf26c04feb6da0b1a411dfe9a7978
- Version Created
- October 8, 2025