sai88uk/minicpm-v-45-v9 πΌοΈππ’β β π
About
A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding
Example Output
Output
Looking at the shops and stores, I see HARRODS, "AMSTON", "KANOMO", "SOLARIS", "KARD RURON", "KONINGS", and "MEXITCHOS". There are also smaller signs like "BAR & KITCHEN", "MEXITCHOS", and "MARLBORO". Wait, let me make sure I don't miss any. Also, some signs are partial, like "TSN" and "TSNO". There's a large "Coca-Cola" ad on the building, and another one with a person, maybe part of a campaign.
Let me list them all systematically. Start with the main billboards: Coca-Cola (multiple ads, probably different positions), SANYO, CANO, KODAK, San Benedetto La Bottega. Then the buildings: HARRODS (large, with "HARRODS" visible), KARD RURON, AMSTON, KANOMO, SOLARIS, KONINGS, MEXITCHOS, BAR & KITCHEN, RECTIONS, MARLBORO. And some smaller ones: "TSN", "TSNO", "1800", "TSNO" (another one), "KARD RURON", "KARD RURON" (maybe same as above), "KARD RURON" again? Wait, maybe I misread. Let me check again. The image shows "HARRODS", "KARD RURON", "AMSTON", "KANOMO", "SOLARIS", "KONINGS", "MEXITCHOS", "BAR & KITCHEN", "REACTIONS", "MARLBORO". Also, the statue has "San Benedetto La Bottega".
Wait, the question is to read all the signs and text in the video. So I need to include all visible text, even if partially. Let me go through each element.
First, billboards:
- Coca-Cola (appears multiple times, different ads)
- SANYO
- CANO (Cano logo advertisement)
- KODAK
- San Benedetto La Bottega (on the statue, likely a sign on the building)
Then shop signs on the buildings:
- HARRODS (visible on the left building with the curved roof)
- KARD RURON (multiple instances, possibly same sign)
- AMSTON (blue sign below a Coca-Cola ad)
- KANOMO (blue sign, "KANOMO")
- SOLARIS (blue sign, "SOLARIS")
- KONINGS (blue sign, "KONINGS")
- MEXITCHOS (blue sign, "MEXITCHOS")
- BAR & KITCHEN (yellow sign, "BAR & KITCHEN")
- REACTIONS (blue sign, "REACTIONS")
- MARLBORO (maybe part of a building sign, "MARLBORO")
- Also, some smaller signs: "TSN", "TSNO", "1800", "MEXITCHOS" (maybe repeated), "KARD RURON" (maybe repeated, but probably the same sign).
Wait, let me check the video description again. The user says "read all the signs and text exist in this video and write them down". So I need to make sure I capture every single text visible. Let me list them all:
Big billboards:
- Coca-Cola (multiple ads, e.g., red Coca-Cola sign, another Coca-Cola ad on a blue background)
- SANYO
- CANO (Cano logo advertisement, "Cano" text)
- KODAK (yellow KODAK sign)
- San Benedetto La Bottega (text on the statue, part of the building sign)
Shop signs and storefronts:
- HARRODS (large, prominent on the left building)
- KARD RURON (appears multiple times, possibly on different parts of the building)
- AMSTON (blue sign, "AMSTON")
- KANOMO (blue sign, "KANOMO")
- SOLARIS (blue sign, "SOLARIS")
- KONINGS (blue sign, "KONINGS")
- MEXITCHOS (blue sign, "MEXITCHOS")
- BAR & KITCHEN (yellow sign, "BAR & KITCHEN")
- REACTIONS (blue sign, "REACTIONS")
- MARLBORO (part of a sign, "MARLBORO")
- TSNO (red billboard, "TSNO")
- 1800 (red billboard, "1800" or "1890"?)
Wait, the red billboard has "TSNO" and "1800". Let me confirm. Yes, the red billboard in the center has "TSNO" and "1800".
Also, there are signs like "STOP" on the traffic sign pole, but that's not part of the main commercial signs. The horse has a saddle, but no text on it. The statue is a landmark with "San Benedetto La Bottega" text.
So compiling all visible texts:
- Coca-Cola (multiple billboards)
- SANYO
- CANO
- KODAK
- San Benedetto La Bottega (on the statue)
- HARRODS
- KARD RURON (multiple instances, same sign)
- AMSTON
- KANOMO
- SOLARIS
- KONINGS
- MEXITCHOS
- BAR & KITCHEN
- REACTIONS
- MARLBORO
- TSNO (billboard)
- 1800 (billboard)
- STOP (traffic sign)
Wait, but the question says "read all the signs and text exist in this video". The traffic sign "STOP" is a sign, but maybe the main commercial ones are the focus. However, let's include all visible texts, even the traffic sign, unless specified otherwise. But the user might want the big commercial signs and store signs. Let me check the original description again.
The scene is a busy city square with cars, a horse, people, and prominent billboards and shop signs. The key texts on billboards are Coca-Cola, SANYO, CANO, KODAK, San Benedetto La Bottega. The shop signs are HARRODS, KARD RURON, AMSTON, KANOMO, SOLARIS, KONINGS, MEXITCHOS, BAR & KITCHEN, REACTIONS, MARLBORO. The billboards also have TSNO and 1800. The traffic sign has STOP.
So, compiling all visible texts:
- Coca-Cola (large, red; another Coca-Cola ad on blue background)
- SANYO (large billboard)
- CANO (Cano logo advertisement)
- KODAK (large yellow sign)
- San Benedetto La Bottega (text on the statue's building)
- HARRODS (store sign)
- KARD RURON (store sign, multiple instances)
- AMSTON (store sign)
- KANOMO (store sign)
- SOLARIS (store sign)
- KONINGS (store sign)
- MEXITCHOS (store sign)
- BAR & KITCHEN (store sign)
- REACTIONS (store sign)
- MARLBORO (store sign)
- TSNO (billboard text)
- 1800 (billboard text)
- STOP (traffic sign text)
Wait, but do we need to include "San Benedetto La Bottega" as part of the billboard or the building sign? The text is on the statue, which is a landmark, so it's part of the building's signage.
Also, check if any other texts are visible. For example, the red double-decker bus has "AMSTON" as a destination, but that's part of the bus ad. Wait, the bus might have its own signs, but the question is about signs and text in the scene, not on vehicles. Wait, the question says "Look through this video and read all the signs and text exist in this video and write them down for me." So probably all visible texts, including on buildings, billboards, vehicles, signs, etc. But perhaps the main focus is on the building signs and billboards. Let me check the original video description again.
The user provided a description of the scene: "Look through this video and read all the signs and text exist in this video and write them down for me." So I need to include all visible text, whether on buildings, billboards, vehicles, signs, etc. But the video is a still image, so the descriptions are based on the image provided. Let's go through the elements described:
- Large red Coca-Cola sign
- "SANYO" on a billboard
- "CANO" on a billboard
- "KODAK" on a yellow billboard
- "San Benedetto La Bottega" on the statue's building
- "HARRODS" on the curved building
- "KARD RURON" on a building
- "AMSTON" on a building
- "KANOMO" on a building
- "SOLARIS" on a building
- "KONINGS" on a building
- "MEXITCHOS" on a building
- "BAR & KITCHEN" on a building
- "REACTIONS" on a building
- "MARLBORO" on a building
- "TSN" on a red billboard
- "TSNO" on a red billboard
- "1800" on a red billboard
- "STOP" on a traffic sign pole
So, compiling all these:
Billboards and large signs:
- Coca-Cola (multiple)
- SANYO
- CANO
- KODAK
- San Benedetto La Bottega
- HARRODS
- KARD RURON
- AMSTON
- KANOMO
- SOLARIS
- KONINGS
- MEXITCHOS
- BAR & KITCHEN
- REACTIONS
- MARLBORO
- TSN
- TSNO
- 1800
Traffic signs:
- STOP
That's the list. Now, let me check for any possible repetitions or missed texts. For example, the red billboard has "TSN" and "TSNO" and "1800". The building with the curved roof has "HARRODS". The building signs include "KARD RURON", "AMSTON", "KANOMO", "SOLARIS", "KONINGS", "MEXITCHOS", "BAR & KITCHEN", "REACTIONS", "MARLBORO". The statue's building has "San Benedetto La Bottega". The traffic sign has "STOP".
Yes, that seems comprehensive. So all the visible texts are the ones listed above.
Here are all the visible signs and text extracted from the video:
1. Billboards and Large Advertisements
- Coca-Cola (multiple billboards, including red and blue backgrounds)
- SANYO (large billboard on the left building)
- CANO (Cano logo advertisement billboard)
- KODAK (large yellow billboard)
- San Benedetto La Bottega (text on the statueβs building, part of the landmark signage)
2. Storefronts and Building Signs
- HARRODS (prominent on the curved building with classical architecture)
- KARD RURON (multiple instances visible on adjacent buildings)
- AMSTON (blue sign on a building below a Coca-Cola ad)
- KANOMO (blue sign on a building)
- SOLARIS (blue sign on a building)
- KONINGS (blue sign on a building)
- MEXITCHOS (blue sign on a building)
- BAR & KITCHEN (yellow sign on a building)
- REACTIONS (blue sign on a building)
- MARLBORO (text on a building, likely part of a storefront or adjacent signage)
3. Billboards with Specific Text
- TSNO (red billboard in the center, below "Coca-Cola")
- 1800 (red billboard in the center, next to "TSNO")
4. Traffic and Miscellaneous Signs
- STOP (traffic sign pole, regulating vehicle flow)
These are all the visible signs and text elements present in the busy city scene.
Performance Metrics
All Input Parameters
{ "video": "https://replicate.delivery/pbxt/NpB0xL7i51BQmIaLWBGAF1WidcojvAhkiWa8beEO2am7HJMd/tmp0h70a7wf.mp4", "question": "Look through this video and read all the signs and text exist in this video and write them down for me", "video_fps": 10, "thinking_mode": "ultra" }
Input Parameters
- image
- Input image (for image chat)
- video
- Input video (for video analysis)
- question (required)
- Question about the image/video
- video_fps
- Video FPS (auto-adjusted for long videos)
- thinking_mode
- Thinking mode
Output Schema
Output
Example Execution Logs
A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM-V-4_5: - image_processing_minicpmv.py . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM-V-4_5: - processing_minicpmv.py - image_processing_minicpmv.py . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Version Details
- Version ID
5f9e86550c3540aab9292e0cae22f71bb75724be3c9bb72ebf0798d028f0f27b
- Version Created
- October 3, 2025