
ai-forever/kandinsky-2.2
Generate images from multilingual text prompts. Leverage a latent diffusion pipeline with a CLIP-ViT-G encoder for stron...
Found 65 models (showing 1-20)
Generate images from multilingual text prompts. Leverage a latent diffusion pipeline with a CLIP-ViT-G encoder for stron...
Generate text descriptions and answers from a video input. Accepts a video and an optional prompt to perform video capti...
Convert text into vector embeddings for semantic search, retrieval, clustering, and classification. Accepts a string or...
Convert multilingual text into dense embeddings for semantic search and cross-lingual retrieval. Accepts a list of texts...
Generate and chat in multiple languages from a text prompt. Accepts a user prompt and optional system prompt and returns...
Generate multilingual chat responses and long-form text from a text prompt, returning text. Support up to 32K context fo...
Convert text into spoken audio for low-latency, real-time use. Choose from 300+ prebuilt voices or use a cloned voice, w...
Generate videos with synchronized audio from an image and text prompt. Accept an optional audio clip for voice or music...
Convert text into multilingual embeddings for semantic search and retrieval. Accepts a list of texts and returns a 768-d...
Embed text and images into a shared vector space for cross-modal search, retrieval, and similarity. Accepts text (up to...
Generate multilingual chat responses from text prompts. Handle question answering, document summarization, drafting, tra...
Transcribe speech from audio to text. Run Whisper large-v3 for multilingual automatic speech recognition (ASR), optional...
Generate expressive speech audio from text input. Control prosody, emotion, and acoustic context with a scene descriptio...
Clone a voice from a short audio sample and generate multilingual speech from text. Accepts a text prompt and a referenc...
Generate helpful text responses for instruction-following, reasoning, coding, and multilingual dialogue. Accepts a text...
Convert text to speech audio with adjustable speed and a wide selection of preset voices. Accepts text input (long passa...
Generate speech audio from text input. Accepts text and a selectable language code; returns spoken audio using Coqui TTS...
Clone a voice and synthesize speech from text using a short reference audio sample. Accepts source audio and its transcr...
Convert text to speech in multiple languages with selectable preset voices. Accepts text plus a language code and voice...
Generate images from a text prompt. Produce high-resolution outputs up to 4096×4096 with fast sampling and a wide artist...