naklecha/cogvlm 🖼️📝 → 🖼️

▶️ 12.4K runs 📅 Nov 2023 ⚙️ Cog 0.8.6 🔗 GitHub 📄 Paper ⚖️ License
image-captioning image-to-text visual-question-answering

About

CogVLM is a powerful open-source visual language model (VLM)

Example Output

Prompt:

"describe this image"

Output

This image captures a moment from a basketball game. Two players are prominently featured: one wearing a yellow jersey with the number 24 and the word 'Lakers' printed on it, and the other in a navy blue jersey with the word 'Washington' and the number 34. The player in yellow is holding a basketball and appears to be dribbling it, while the player in blue is reaching out with his arm, possibly trying to block or defend. The background shows a filled stadium with spectators, indicating that this is a professional game.

Performance Metrics

9.82s Prediction Time
9.84s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/JvescfwQHIsJnmvetHnzz7nFB4MzAqwA7VRp3Ug2r1r5MDTN/1.png",
  "prompt": "describe this image"
}
Input Parameters
image (required) Type: string
input image
prompt (required) Type: string
input prompt
Output Schema

Output

Type: arrayItems Type: stringItems Format: uri

Version Details
Version ID
03bb2a3156b39df8688a1f9097bc80389b376388a6dfeef0a4c5aa8119e17ef8
Version Created
November 23, 2023
Run on Replicate →