elevenlabs/v2-multilingual
Generate multilingual text-to-speech audio from text input. Convert up to ~10,000 characters per request into natural, e...
Found 140 models (showing 121-140)
Generate multilingual text-to-speech audio from text input. Convert up to ~10,000 characters per request into natural, e...
Generate expressive speech audio from text input. Choose from preset voices (e.g., Rachel, Drew, Paul, Aria, Domi, Dave,...
Generate multilingual speech audio from text input. Control voice selection (300+ system voices or a custom cloned voice...
Generate speech audio from text with low latency for real-time agents and interactive apps. Accept text input and output...
Hold multi-turn, multimodal conversations grounded in images, audio, video, and text, returning answers as text and opti...
Convert text to speech audio with low latency for real-time voice agents, chatbots, and interactive apps. Generate multi...
Convert text to speech with ultra-low latency for real-time voice agents, chatbots, and interactive apps. Accepts text p...
Generate short podcast clips with a talking AI host from a text prompt. Provide a podcaster_prompt, choose a voice (Wise...
Convert text to speech with low latency for voice agents, narration, and interactive applications. Accepts text (up to 5...
Generate speech and soundtracks from a video input. Condition speech on a provided transcript (text) and optionally a re...
Train and fine-tune a GPT-SoVITS voice model for voice cloning and text-to-speech. Input an audio or video dataset to ex...
Generate multilingual speech from text with preset voices, voice cloning, and voice design. Accept text plus optional la...
Generate speech from text with preset, cloned, or designed voices. Accept text as input and return spoken audio. Choose...
Synthesize Vietnamese speech from text input. Accepts Vietnamese text and returns spoken audio, with optional preprocess...
Convert text to speech with optional voice cloning from a reference audio sample. Accepts text and an optional speaker r...
Convert text to natural, expressive speech with sub-200ms latency. Accepts plain text (up to 2,000 characters) with SSML...
Generate speech audio from text for real-time voice agents and conversational apps. Accepts text input and outputs spoke...
Convert text to natural-sounding speech. Generate high-fidelity audio from up to 10,000 characters with 17+ preset voice...
Generate speech audio from text input. Control emotion (neutral, happy, sad, angry, fearful, disgusted, surprised, calm,...
Convert text to speech with zero-shot voice cloning from a reference audio sample. Provide target text and language (Eng...