lucataco/moondream2 🖼️📝 → 📝

▶️ 15.1M runs 📅 Mar 2024 ⚙️ Cog 0.9.13 🔗 GitHub ⚖️ License

document-analysis image-captioning image-to-text ocr question-answering visual-question-answering visual-understanding vqa

Performance

0.6sTypical run time

~13sCold start (first call)

15.1MTotal runs

About

moondream2 is a small vision language model designed to run efficiently on edge devices

Example Output

Prompt:

"Describe this image"

Output

The image features a logo with a smiling blue circle above the word "moondream" written in black text.

Performance Metrics

0.57s Prediction Time

13.33s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KZKNhDQHqycw8Op7w056J8YTX5Bnb7xVcLiyB4le7oUgT2cY/moondream2.png",
  "prompt": "Describe this image"
}

Input Parameters

image (required) Type: string: Input image
prompt Type: stringDefault: Describe this image: Input prompt

Output Schema

Output

Type: array • Items Type: string

Example Execution Logs

The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.

Version Details

Version ID: 72ccb656353c348c1385df54b237eeb7bfa874bf11486cf0b9473e691b662d31
Version Created: July 29, 2024

Run on Replicate →