lucataco/kosmos-2 🖼️✓❓ → ❓

▶️ 1.9K runs 📅 Nov 2023 ⚙️ Cog 0.8.6 🔗 GitHub ⚖️ License
image-object-detection image-to-text

About

Grounding Multimodal Large Language Models to the World

Example Output

Output

{"img":"https://replicate.delivery/pbxt/drgxiFZGotafKiDanUGbyvkg0kOUeADYiJu6UxthUWu2Qx0RA/output.jpg","text":"Describe this image in detail: The image features a snowman sitting by a campfire in the snow. He is wearing a hat, scarf, and gloves, with a pot nearby and a cup nearby. The snowman appears to be enjoying the warmth of the fire, and it appears to have a warm and cozy atmosphere.

[('a campfire', (71, 81), [(0.171875, 0.015625, 0.484375, 0.984375)]), ('a hat', (109, 114), [(0.515625, 0.046875, 0.828125, 0.234375)]), ('scarf', (116, 121), [(0.515625, 0.234375, 0.890625, 0.578125)]), ('gloves', (127, 133), [(0.515625, 0.390625, 0.640625, 0.515625)]), ('a pot', (140, 145), [(0.078125, 0.609375, 0.265625, 0.859375)]), ('a cup', (157, 162), [(0.890625, 0.765625, 0.984375, 0.984375)])]"}

Performance Metrics

2.85s Prediction Time
91.24s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/JoIS31y9Oy2m04rBBICdzXUw7WleL1uCP6dyV5TeKTft2jjB/snowman.png",
  "visual_output": true,
  "description_type": "Detailed"
}
Input Parameters
image (required) Type: string
Input image
visual_output Type: booleanDefault: true
Select to show the image with bounding boxes
description_type Default: Brief
Description Type
Output Schema
img Type: stringFormat: uri
Img
text Type: string
Text
Version Details
Version ID
d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
Version Created
November 3, 2023
Run on Replicate →