chenxwh/cogvlm2 🔢📝🖼️ → 📝

▶️ 4.6K runs 📅 Sep 2024 ⚙️ Cog 0.9.23 🔗 GitHub 📄 Paper ⚖️ License

image-analysis image-captioning image-to-text visual-understanding

About

CogVLM2: Visual Language Models for Image and Video Understanding

Example Output

Prompt:

"Describe this image."

Output

The image captures a well-lit, modern library or bookstore with a distinct industrial aesthetic. The main focus is a large, wooden bookshelf filled with an assortment of books, creating a warm and inviting atmosphere. The bookshelf is positioned against a rustic brick wall, which adds a touch of vintage charm to the space. The room is illuminated by hanging light bulbs, which dangle from the ceiling in a casual manner. There are also decorative elements such as a potted plant, a small framed sign, and a table with various items on it, enhancing the cozy ambiance. A person is seated at a desk in the foreground, suggesting the space is functional for reading or studying.

Performance Metrics

24.17s Prediction Time

210.01s Total Time

All Input Parameters

{
  "top_p": 0.9,
  "prompt": "Describe this image.",
  "input_image": "https://replicate.delivery/pbxt/Lg6495P56J6wq1WFLML3jPlCuXLOQii8eoPNb1iFwsFiJXVw/book3.jpeg",
  "temperature": 0.7,
  "max_new_tokens": 2048
}

Input Parameters

top_p Type: numberDefault: 0.9Range: 0 - 1: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
prompt Type: stringDefault: Describe this image.: Input prompt
input_image (required) Type: string: Input image
temperature Type: numberDefault: 0.7Range: 0 - ∞: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic
max_new_tokens Type: integerDefault: 2048Range: 0 - ∞: Maximum number of tokens to generate. A word is generally 2-3 tokens

Output Schema

Output

Type: string

Example Execution Logs

/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")

Version Details

Version ID: 6d0394ab8fd98efb399c29c03c84c29f1d43c0259c6ee63d12f44def476c92b2
Version Created: September 24, 2024

Run on Replicate →