
ai-forever/kandinsky-2.2
Generate images from multilingual text prompts. Leverage a latent diffusion pipeline with a CLIP-ViT-G encoder for stron...
Found 66 models (showing 1-20)
Generate images from multilingual text prompts. Leverage a latent diffusion pipeline with a CLIP-ViT-G encoder for stron...
Generate text descriptions and answers from a video input. Accepts a video and an optional prompt to perform video capti...
Convert text into dense vector embeddings for semantic search, retrieval, clustering, and classification. Accepts single...
Convert multilingual text into dense embeddings for semantic search and cross-lingual retrieval. Accepts a list of texts...
Generate and chat in multiple languages from a text prompt. Accepts a user prompt and optional system prompt and returns...
Generate multilingual chat responses and long-form text from a text prompt, returning text. Support up to 32K context fo...
Convert text into spoken audio for low-latency, real-time use. Choose from 300+ prebuilt voices or use a cloned voice, w...
Generate videos with synchronized audio from an image and text prompt. Accept an optional audio clip for voice or music...
Convert text into multilingual embeddings for semantic search and retrieval. Accepts a list of texts and returns a 768-d...
Create multilingual text and image embeddings for cross-modal search, retrieval, and similarity. Accept text (up to 8192...
Generate multilingual chat responses from text prompts. Handle question answering, document summarization, drafting, tra...
Transcribe speech from audio to text. Run Whisper large-v3 for multilingual automatic speech recognition (ASR), optional...
Generate expressive speech audio from text input. Control prosody, emotion, and acoustic context with a scene descriptio...
Clone a voice from a short audio sample and generate multilingual speech from text. Accepts a text prompt and a referenc...
Generate helpful text responses for instruction-following, reasoning, coding, and multilingual dialogue. Accepts a text...
Convert text to speech audio with adjustable speed and a wide selection of preset voices. Accepts text input (long passa...
Generate speech audio from text input. Accepts text and a selectable language code; returns spoken audio using Coqui TTS...
Clone voices and generate multilingual speech from text using a reference audio sample. Provide source audio and its tra...
Convert text to speech in multiple languages with selectable preset voices. Accepts text plus a language code and voice...
Generate images from a text prompt. Produce high-resolution outputs up to 4096×4096 with fast sampling and a wide artist...