x-lance/f5-tts
Synthesize speech from text in a cloned voice using a reference audio sample. Provide a text prompt and speaker referenc...
Found 17 models (showing 1-17)
Synthesize speech from text in a cloned voice using a reference audio sample. Provide a text prompt and speaker referenc...
Clone a target voice and generate speech audio from text. Provide a short speaker reference audio and its transcript (te...
Clone a voice from a short reference clip and generate speech from text. Accepts text and a reference audio sample; outp...
Generate speech from text in a cloned voice using a reference audio sample and its transcript. Accepts text plus speaker...
Generate speech from text conditioned on a reference voice sample. Input text and a speaker reference audio clip, and ou...
Clone a voice and synthesize speech from text. Provide a reference audio clip and its transcript plus target text to gen...
Generate expressive speech from text with zero-shot voice cloning using a reference speaker audio input. Control emotion...
Generate speech audio from text, with optional voice cloning conditioned on a reference recording. Accepts text, an opti...
Generate speech audio from text while cloning a target voice from a reference audio sample. Provide the text to speak, a...
Generate Hololive VTuber-style speech from text or convert a reference audio clip into those voices. Takes text input or...
Clone a speaker's voice and synthesize speech from text, including cross-lingual and mixed-lingual output. Accepts refer...
Generate speech from text with optional voice cloning from a reference audio sample. Accepts text plus an optional speak...
Convert speech to a target voice using RVC v2 voice models. Takes an input speech audio clip and outputs converted audio...
Generate spoken audio from text, optionally cloning a target voice from a short speaker reference audio. Accepts text as...
Convert text to expressive speech, with optional speaker style cloning from a short reference audio. Accepts text input...
Clone a voice from a short reference sample and synthesize new speech from text. Accepts text to speak, a 3–15s mono ref...
Generate speech audio from text, with optional voice cloning from a reference speaker clip. Accepts text as the primary...