lucataco/moondream2 🖼️📝 → 📝

▶️ 4.3M runs 📅 Mar 2024 ⚙️ Cog 0.9.13 🔗 GitHub ⚖️ License
image-captioning image-to-text ocr visual-question-answering

About

moondream2 is a small vision language model designed to run efficiently on edge devices

Example Output

Prompt:

"Describe this image"

Output

The image features a logo with a smiling blue circle above the word "moondream" written in black text.

Performance Metrics

0.57s Prediction Time
13.33s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KZKNhDQHqycw8Op7w056J8YTX5Bnb7xVcLiyB4le7oUgT2cY/moondream2.png",
  "prompt": "Describe this image"
}
Input Parameters
image (required) Type: string
Input image
prompt Type: stringDefault: Describe this image
Input prompt
Output Schema

Output

Type: arrayItems Type: string

Example Execution Logs
The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
Version Details
Version ID
72ccb656353c348c1385df54b237eeb7bfa874bf11486cf0b9473e691b662d31
Version Created
July 29, 2024
Run on Replicate →