lucataco/minicpm-v-4 🖼️📝🔢 → 📝
About
MiniCPM-V 4.0 has strong image and video understanding performance

Example Output
"describe this image in detail"
Output
The image depicts two fencers engaged in a bout on an indoor sports court. Both athletes are dressed in full white fencing gear, which includes masks, jackets, breeches, gloves, and shoes with protective elements. The fencer on the left is wearing blue accents on their socks and shoes, while the fencer on the right has black accents.
They appear to be in a dynamic stance typical of fencing, facing each other with their foils extended, ready for action. The background features a wooden floor marked with various colored lines indicative of multi-purpose usage by different sports activities. Behind them are wall bars used for exercises or training purposes, as well as windows that allow natural light into the room. A basketball hoop is mounted on the wall at the far end of the space, suggesting this area serves multiple functions within the facility. There are no visible audience members or additional equipment apart from what's necessary for fencing.
Performance Metrics
All Input Parameters
{ "video": "https://replicate.delivery/pbxt/NWB3o7ex6nxH0WLE4gMMw19UoXHTu4tGVpSO5Pml7yhn52z4/replicate-prediction-j308y58jt9rma0crk8hamda7mr.mp4", "prompt": "describe this image in detail", "video_max_frames": 8 }
Input Parameters
- image
- Input image file (RGB).
- video
- Optional input video file. If provided, frames will be sampled and used as multiple images.
- prompt (required)
- User question/prompt for the image or video.
- video_max_frames
- Maximum number of frames to sample from the video.
Output Schema
Output
Version Details
- Version ID
8b647b895c75cc7885d0a22d4fb1a0a2cb4fcf8ebbc13b78e09ec671f9183b27
- Version Created
- August 11, 2025