cjwbw/melotts
Generate speech from text in multiple languages. Accepts text plus a language code (EN, ES, FR, ZH, JP, KR), optional En...
Found 97 models (showing 61-80)
Generate speech from text in multiple languages. Accepts text plus a language code (EN, ES, FR, ZH, JP, KR), optional En...
Generate text responses from prompts using the Llama 3.2 3B instruction-tuned multilingual language model. Supports text...
Generate expressive speech from text with optional voice cloning from a short reference clip. Accept a text prompt and a...
Transcribe speech from audio to text. Handle multilingual transcription with automatic language detection and optional t...
Transcribe speech to text from audio input. Accepts an audio file and optionally a source language, returns a transcript...
Transcribe and understand audio with Voxtral Mini 3B, an advanced model that builds upon Ministral-3B. It excels in spee...
Generate videos with synchronized audio from text prompts using Alibaba's WAN 2.5 model. Creates fully synchronized vide...
Generate speech from text with optional voice cloning from a short reference audio. Accept text plus a 5–30s speaker sam...
Transcribe and translate speech to text from audio input. Supports multilingual ASR in English, French, German, Spanish,...
Generate chat and instruction-following text from prompts. Accepts a text prompt (and optional system prompt) and return...
Generate and chat in Indonesian, English, Sundanese, and Javanese from a text prompt. Takes text input and returns text...
Clone a voice from a short reference clip and generate speech from text. Accepts text and a reference audio sample; outp...
Generate multilingual speech from text with zero-shot voice cloning. Provide a short reference audio clip and its transc...
Generate natural, conversational speech and two-speaker dialogues from text. Choose from preset voices (Angelo, Arsenio,...
Extract text from images and PDFs in 90+ languages. Accept an image or multi-page PDF, a selected language list, and a p...
Generates high-resolution images up to 2K from text prompts with bilingual support for Chinese and English. Excels at cr...
Generate multilingual text-to-speech audio from text input. Convert up to ~10,000 characters per request into natural, e...
Translate text between languages. Accepts a single string or an array of strings (up to 5,000 characters each) and retur...
Generate expressive speech audio from text input. Choose from preset voices (e.g., Rachel, Drew, Paul, Aria, Domi, Dave,...
Generate text responses based on prompts and conversations. This 57 billion parameter Mixture-of-Experts language model...