lucataco/kosmos-2 🖼️✓❓ → ❓

▶️ 1.9K runs 📅 Nov 2023 ⚙️ Cog 0.8.6 🔗 GitHub ⚖️ License

image-object-detection image-to-text

Performance

2.8sTypical run time

~91sCold start (first call)

1.9KTotal runs

About

Grounding Multimodal Large Language Models to the World

Example Output

Output

{"img":"https://replicate.delivery/pbxt/drgxiFZGotafKiDanUGbyvkg0kOUeADYiJu6UxthUWu2Qx0RA/output.jpg","text":"Describe this image in detail: The image features a snowman sitting by a campfire in the snow. He is wearing a hat, scarf, and gloves, with a pot nearby and a cup nearby. The snowman appears to be enjoying the warmth of the fire, and it appears to have a warm and cozy atmosphere.

[('a campfire', (71, 81), [(0.171875, 0.015625, 0.484375, 0.984375)]), ('a hat', (109, 114), [(0.515625, 0.046875, 0.828125, 0.234375)]), ('scarf', (116, 121), [(0.515625, 0.234375, 0.890625, 0.578125)]), ('gloves', (127, 133), [(0.515625, 0.390625, 0.640625, 0.515625)]), ('a pot', (140, 145), [(0.078125, 0.609375, 0.265625, 0.859375)]), ('a cup', (157, 162), [(0.890625, 0.765625, 0.984375, 0.984375)])]"}

Performance Metrics

2.85s Prediction Time

91.24s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/JoIS31y9Oy2m04rBBICdzXUw7WleL1uCP6dyV5TeKTft2jjB/snowman.png",
  "visual_output": true,
  "description_type": "Detailed"
}

Input Parameters

image (required) Type: string: Input image
visual_output Type: booleanDefault: true: Select to show the image with bounding boxes
description_type Default: Brief: Description Type

Output Schema

img Type: stringFormat: uri: Img
text Type: string: Text

Version Details

Version ID: d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
Version Created: November 3, 2023

Run on Replicate →