chenxwh/cogvlm2-video πŸ”’πŸ“πŸ–ΌοΈ β†’ πŸ“

▢️ 663.4K runs πŸ“… Sep 2024 βš™οΈ Cog 0.9.23 πŸ”— GitHub πŸ“„ Paper βš–οΈ License
multilingual video-auto-captioning video-question-answering video-to-text

About

CogVLM2: Visual Language Models for Image and Video Understanding

Example Output

Prompt:

"请仔细描述这δΈͺ视钑"

Output

In the video, we see a large elephant walking across a dry grassland. The elephant's skin is covered in a vibrant, rainbow-colored pattern. The elephant's ears are large and floppy, and it has a long, curved trunk. The elephant's eyes are visible, and it appears to be moving purposefully. The background is a clear blue sky, and there are no other objects or creatures in sight. The elephant's colorful skin stands out against the natural surroundings, creating a striking visual contrast.

Performance Metrics

13.42s Prediction Time
99.11s Total Time
All Input Parameters
{
  "top_p": 0.1,
  "prompt": "请仔细描述这δΈͺ视钑",
  "input_video": "https://replicate.delivery/pbxt/LgGGFqrlw37TQMMWbjCTvYwp1vhH916HGqjKIuLxyB5SqiuT/%E5%A4%A7%E8%B1%A1.mp4",
  "temperature": 0.1,
  "max_new_tokens": 2048
}
Input Parameters
top_p Type: numberDefault: 0.1Range: 0 - 1
When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
prompt Type: stringDefault: Describe this video.
Input prompt
input_video (required) Type: string
Input video
temperature Type: numberDefault: 0.1Range: 0 - ∞
Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic
max_new_tokens Type: integerDefault: 2048Range: 0 - ∞
Maximum number of tokens to generate. A word is generally 2-3 tokens
Output Schema

Output

Type: string

Example Execution Logs
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Version Details
Version ID
9da7e9a554d36bb7b5fec36b43b00e4616dc1e819bc963ded8e053d8d8196cb5
Version Created
September 24, 2024
Run on Replicate β†’