cjwbw/melotts
Generate speech from text in multiple languages. Accepts text plus a language code (EN, ES, FR, ZH, JP, KR), optional En...
Found 93 models (showing 61-80)
Generate speech from text in multiple languages. Accepts text plus a language code (EN, ES, FR, ZH, JP, KR), optional En...
Generate and chat in multiple languages from text input. Produce long-form responses (up to ~4k new tokens) for question...
Generate expressive speech from text with optional voice cloning from a short reference clip. Accept a text prompt and a...
Transcribe speech from audio to text. Handle multilingual transcription with automatic language detection and optional t...
Transcribe speech to text from audio input. Accepts an audio file and optionally a source language, returns a transcript...
Transcribe and understand audio with Voxtral Mini 3B, an advanced model that builds upon Ministral-3B. It excels in spee...
Generate videos with audio from a text prompt. Produce 5–10 second clips at 480p, 720p, or 1080p across six aspect ratio...
Generate speech from text with optional voice cloning from a short reference audio. Accept text plus a 5–30s speaker sam...
Transcribe and translate speech to text from audio input. Supports multilingual ASR in English, French, German, Spanish,...
Generate chat and instruction-following text from prompts. Accepts a text prompt (and optional system prompt) and return...
Generate and chat in Indonesian, English, Sundanese, and Javanese from a text prompt. Takes text input and returns text...
Clone a voice from a short reference clip and generate speech from text. Accepts text and a reference audio sample; outp...
Generate multilingual speech from text with zero-shot voice cloning. Provide a short reference audio clip and its transc...
Generate natural, conversational speech and two-speaker dialogues from text. Choose from preset voices (Angelo, Arsenio,...
Extract text from images and PDFs in 90+ languages. Accept an image or multi-page PDF, a selected language list, and a p...
Generate images from Chinese or English text prompts. Output native 2K-resolution images with selectable aspect ratios (...
Generate multilingual text-to-speech audio from text input. Convert up to ~10,000 characters per request into natural, e...
Translate text between languages. Accepts a single string or an array of strings (up to 5,000 characters each) and retur...
Generate expressive speech audio from text input. Choose from preset voices (e.g., Rachel, Drew, Paul, Aria, Domi, Dave,...
Generate chat completions and long-form text from a prompt or system+user messages. Support up to 65,536-token context f...