lucataco/minicpm-v-4 🖼️📝🔢 → 📝

▶️ 219 runs 📅 Aug 2025 ⚙️ Cog 0.16.1 🔗 GitHub ⚖️ License
image-to-text ocr video-to-text visual-understanding

About

MiniCPM-V 4.0 has strong image and video understanding performance

Example Output

Prompt:

"describe this image in detail"

Output

The image depicts two fencers engaged in a bout on an indoor sports court. Both athletes are dressed in full white fencing gear, which includes masks, jackets, breeches, gloves, and shoes with protective elements. The fencer on the left is wearing blue accents on their socks and shoes, while the fencer on the right has black accents.

They appear to be in a dynamic stance typical of fencing, facing each other with their foils extended, ready for action. The background features a wooden floor marked with various colored lines indicative of multi-purpose usage by different sports activities. Behind them are wall bars used for exercises or training purposes, as well as windows that allow natural light into the room. A basketball hoop is mounted on the wall at the far end of the space, suggesting this area serves multiple functions within the facility. There are no visible audience members or additional equipment apart from what's necessary for fencing.

Performance Metrics

4.85s Prediction Time
68.39s Total Time
All Input Parameters
{
  "video": "https://replicate.delivery/pbxt/NWB3o7ex6nxH0WLE4gMMw19UoXHTu4tGVpSO5Pml7yhn52z4/replicate-prediction-j308y58jt9rma0crk8hamda7mr.mp4",
  "prompt": "describe this image in detail",
  "video_max_frames": 8
}
Input Parameters
image Type: string
Input image file (RGB).
video Type: string
Optional input video file. If provided, frames will be sampled and used as multiple images.
prompt (required) Type: string
User question/prompt for the image or video.
video_max_frames Type: integerDefault: 8Range: 1 - 64
Maximum number of frames to sample from the video.
Output Schema

Output

Type: string

Version Details
Version ID
8b647b895c75cc7885d0a22d4fb1a0a2cb4fcf8ebbc13b78e09ec671f9183b27
Version Created
August 11, 2025
Run on Replicate →