cuuupid/qwen2-vl-2b 🖼️🔢📝 → 📝

▶️ 599 runs 📅 Aug 2024 ⚙️ Cog 0.9.20 🔗 GitHub 📄 Paper ⚖️ License
video-auto-captioning video-to-text

About

SOTA open-source model for chatting with videos and the newest model in the Qwen family

Example Output

Prompt:

"Describe the video in great detail."

Output

The video features a woman standing behind a podium, speaking to the camera. She is wearing a blue shirt and appears to be giving a presentation or lecture. The woman's facial expression suggests that she is engaged and passionate about the topic she is discussing. The background of the video is not visible, but it can be assumed that it is an indoor setting, possibly a conference room or lecture hall. The woman's speech is not audible, but her body language and gestures suggest that she is using hand movements to emphasize certain points. Overall, the video seems to be a formal presentation or lecture, with the woman as the main speaker.

Performance Metrics

16.38s Prediction Time
16.39s Total Time
All Input Parameters
{
  "video": "https://replicate.delivery/pbxt/LXVISWYD8Od0I7w6EW5VIO3sycOIcukn6H26wrkaOX95RK7E/dod_classification_training.mp4",
  "width": 128,
  "height": 128,
  "prompt": "Describe the video in great detail.",
  "max_tokens": 512,
  "temperature": 0.7,
  "max_duration": 180
}
Input Parameters
video (required) Type: string
Video to process
width Type: integerDefault: 128Range: 128 - 2048
Width for the video
height Type: integerDefault: 128Range: 128 - 2048
Height for the video
prompt Type: stringDefault: Describe the video.
Prompt to use for the video
max_tokens Type: integerDefault: 128Range: 1 - 8192
Maximum number of tokens to generate
temperature Type: numberDefault: 0.7Range: 0.01 - 1
Temperature for the model (0.7 is a good default).
max_duration Type: numberDefault: 60Range: 1 - 768
Maximum duration of the video in seconds (above 360, may run out of VRAM).
repetition_penalty Type: numberDefault: 1.1Range: 0.01 - 1.5
Repetition penalty for the model (1.1 is a good default).
Output Schema

Output

Type: string

Version Details
Version ID
b3e77005f19950db4f8564bbedd5670a3e1bd293f1fefd0088d10d8e4d083dd4
Version Created
August 31, 2024
Run on Replicate →