camenduru/minigpt4-video
Analyze and understand video content by generating textual descriptions from video inputs. This model uses interleaved v...
Found 6 models (showing 1-6)
Analyze and understand video content by generating textual descriptions from video inputs. This model uses interleaved v...
Generate realistic audio from video and text descriptions for professional-grade sound effect creation. Supports high-fi...
Moderate and classify text and images for safety policy compliance. Accept a text prompt and optional multiple images, a...
Run any ComfyUI workflow to generate images or videos. Provide the workflow as ComfyUI API JSON (“Save (API format)”) wi...
Edit images based on textual instructions using a multimodal large language model. This model allows users to input an i...
Classify the safety of multimodal inputs (image and user message) for content moderation. Accepts an image (required) an...