cjwbw/cogvlm ✓🖼️📝 → 📝

▶️ 1.5M runs 📅 Nov 2023 ⚙️ Cog 0.8.3 🔗 GitHub 📄 Paper ⚖️ License

image-analysis image-captioning image-to-text visual-question-answering visual-understanding vqa

Performance

12.8sTypical run time

~305sCold start (first call)

1.5MTotal runs

About

powerful open-source visual language model

Example Output

Output

This image captures a moment from a basketball game. Two players are prominently featured: one wearing a yellow jersey with the number 24 and the word 'Lakers' printed on it, and the other wearing a navy blue jersey with the word 'Washington' and the number 34. The player in yellow is holding a basketball and appears to be dribbling it, while the player in navy blue is reaching out with his arm, possibly trying to block or defend. The background shows a filled stadium with spectators, indicating that this is a professional game.

Performance Metrics

12.77s Prediction Time

304.81s Total Time

All Input Parameters

{
  "vqa": false,
  "image": "https://replicate.delivery/pbxt/JxpR9X9MatO10emxFW8GijURnrMAcQZ17fLJc5Xbu9zuQjwU/1.png",
  "query": "Describe this image."
}

Input Parameters

vqa Type: booleanDefault: false: Enable vqa mode.
image (required) Type: string: Input image.
query Type: stringDefault: Describe this image.: Input query.

Output Schema

Output

Type: string

Version Details

Version ID: a5092d718ea77a073e6d8f6969d5c0fb87d0ac7e4cdb7175427331e1798a34ed
Version Created: November 30, 2023

Run on Replicate →