cjwbw/melotts
Generate speech from text in multiple languages. Accepts text plus a language code (EN, ES, FR, ZH, JP, KR), optional En...
Found 85 models (showing 61-80)
Generate speech from text in multiple languages. Accepts text plus a language code (EN, ES, FR, ZH, JP, KR), optional En...
Generate and chat in multiple languages from text input. Produce long-form responses (up to ~4k new tokens) for question...
Generate expressive speech from text with optional voice cloning from a short reference clip. Accept a text prompt and a...
Transcribe speech from audio to text. Handle multilingual transcription with automatic language detection and optional t...
Transcribe speech to text from audio input. Accepts an audio file and optionally a source language, returns a transcript...
Transcribe and understand audio with Voxtral Mini 3B, an advanced model that builds upon Ministral-3B. It excels in spee...
Generate videos with audio from a text prompt. Produce 5–10 second clips at 480p, 720p, or 1080p across six aspect ratio...
Generate speech from text with optional voice cloning from a short reference audio. Accept text plus a 5–30s speaker sam...
Transcribe and translate speech to text from audio input. Supports multilingual ASR in English, French, German, Spanish,...
Generate multilingual chat completions and long-form text from a text prompt. Perform question answering, reasoning, cod...
Generate and chat in Indonesian, English, Sundanese, and Javanese from a text prompt. Takes text input and returns text...
Clone a voice from a short reference clip and generate speech from text. Accepts text and a reference audio sample; outp...
Clone a voice and synthesize speech from text. Provide a reference audio clip and its transcript plus target text to gen...
Generate natural, conversational speech and two-speaker dialogues from text. Choose from preset voices (Angelo, Arsenio,...
Extract text from images and PDFs in 90+ languages. Accept an image or multi-page PDF, a selected language list, and a p...
Generate images from Chinese or English text prompts. Output native 2K-resolution images with selectable aspect ratios (...
Generate speech audio from text in 30+ languages. Accepts a text prompt and language code, with selectable preset voices...
Translate text between languages. Accepts a single string or an array of strings (up to 5,000 characters each) and retur...
Generate expressive speech audio from text input. Choose from preset voices (e.g., Rachel, Drew, Paul, Aria, Domi, Dave,...
Generate chat completions and long-form text from a prompt or system+user messages. Support up to 65,536-token context f...