naklecha/cogvlm 🖼️📝 → 🖼️

▶️ 12.5K runs 📅 Nov 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper ⚖️ License

image-analysis image-captioning image-to-text visual-question-answering visual-understanding

Performance

9.8sTypical run time

12.5KTotal runs

About

CogVLM is a powerful open-source visual language model (VLM)

Example Output

Prompt:

"describe this image"

Output

This image captures a moment from a basketball game. Two players are prominently featured: one wearing a yellow jersey with the number 24 and the word 'Lakers' printed on it, and the other in a navy blue jersey with the word 'Washington' and the number 34. The player in yellow is holding a basketball and appears to be dribbling it, while the player in blue is reaching out with his arm, possibly trying to block or defend. The background shows a filled stadium with spectators, indicating that this is a professional game.

Performance Metrics

9.82s Prediction Time

9.84s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/JvescfwQHIsJnmvetHnzz7nFB4MzAqwA7VRp3Ug2r1r5MDTN/1.png",
  "prompt": "describe this image"
}

Input Parameters

image (required) Type: string: input image
prompt (required) Type: string: input prompt

Output Schema

Output

Type: array • Items Type: string • Items Format: uri

Version Details

Version ID: 03bb2a3156b39df8688a1f9097bc80389b376388a6dfeef0a4c5aa8119e17ef8
Version Created: November 23, 2023

Run on Replicate →