multimodal AI Models - Cloudernative

Analyze and understand video content by generating textual descriptions from video inputs. This model uses interleaved v...

📝 → 📝 • video-analysis • video-understanding • text-generation • 840 runs

Generate realistic audio from video and text descriptions for professional-grade sound effect creation. Supports high-fi...

🎥 • text-video-to-audio • audio-generation • video-sound-effects • 3 runs

Moderate text prompts, model responses, and images for safety compliance. Accepts text with optional multiple images and...

content-moderation • safety-classification • multimodal • 46.3K runs

Run any ComfyUI workflow to generate images or videos. Provide the workflow as ComfyUI API JSON (“Save (API format)”) wi...

comfyui • workflow-runner • 6.8M runs

Edit images based on textual instructions using a multimodal large language model. This model allows users to input an i...

🖼️ → 🖼️ • image-editing • text-guided-editing • multimodal • 6.9K runs

Classify the safety of multimodal inputs (image and user message) for content moderation. Accepts an image (required) an...

🖼️ • content-moderation • multimodal-safety • image-nsfw-detection • 1.5K runs

Analyzes images and answers questions about them using a unified autoregressive framework for multimodal understanding....

🖼️ → 📝 • image-to-text • visual-understanding • multimodal • 6.7K runs

Generate text responses from text, image, video, and audio inputs with controllable reasoning depth. Supports up to 1 mi...

🖼️ → 📝 • text-generation • image-to-text • video-to-text • 6.8M runs

Accepts arbitrary sequences of image and text inputs to produce text outputs for multimodal tasks. Answers questions abo...

🖼️ → 📝 • image-to-text • visual-understanding • ocr • 1.2K runs