cjwbw/cogvlm ✓🖼️📝 → 📝

▶️ 1.5M runs 📅 Nov 2023 ⚙️ Cog 0.8.3 🔗 GitHub 📄 Paper ⚖️ License
image-analysis image-captioning image-to-text visual-question-answering visual-understanding vqa

About

powerful open-source visual language model

Example Output

Output

This image captures a moment from a basketball game. Two players are prominently featured: one wearing a yellow jersey with the number 24 and the word 'Lakers' printed on it, and the other wearing a navy blue jersey with the word 'Washington' and the number 34. The player in yellow is holding a basketball and appears to be dribbling it, while the player in navy blue is reaching out with his arm, possibly trying to block or defend. The background shows a filled stadium with spectators, indicating that this is a professional game.

Performance Metrics

12.77s Prediction Time
304.81s Total Time
All Input Parameters
{
  "vqa": false,
  "image": "https://replicate.delivery/pbxt/JxpR9X9MatO10emxFW8GijURnrMAcQZ17fLJc5Xbu9zuQjwU/1.png",
  "query": "Describe this image."
}
Input Parameters
vqa Type: booleanDefault: false
Enable vqa mode.
image (required) Type: string
Input image.
query Type: stringDefault: Describe this image.
Input query.
Output Schema

Output

Type: string

Version Details
Version ID
a5092d718ea77a073e6d8f6969d5c0fb87d0ac7e4cdb7175427331e1798a34ed
Version Created
November 30, 2023
Run on Replicate →