lucataco/qwen2-vl-7b-instruct 🖼️📝🔢 → 📝

▶️ 259.6K runs 📅 Dec 2024 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License
image-to-text ocr video-to-text

About

Latest model in the Qwen family for chatting with video and image models

Example Output

Prompt:

"Describe this video in detail."

Output

The video features a monkey riding a skateboard in a park. The monkey is wearing roller skates and is seen moving forward on the skateboard, with its arms outstretched for balance. The park is filled with people sitting on benches and walking around, enjoying the sunny day. The monkey appears to be enjoying itself, and the scene is filled with a sense of fun and playfulness. The park is surrounded by trees and greenery, creating a peaceful and relaxing atmosphere. Overall, the video captures a unique and entertaining moment of a monkey enjoying a day out in the park on roller skates.

Performance Metrics

6.24s Prediction Time
6.25s Total Time
All Input Parameters
{
  "media": "https://replicate.delivery/pbxt/MB8qw19bkjGGCTr8Px17db2ydBA3xrHyxBk5g5wRSEH0in9N/q2m-LO3Xg0vO0xmw.mp4",
  "prompt": "Describe this video in detail.",
  "max_new_tokens": 128
}
Input Parameters
media (required) Type: string
Input image or video file
prompt Type: stringDefault: Describe this in detail.
Custom prompt to guide the description
max_new_tokens Type: integerDefault: 128Range: 1 - 512
Maximum number of tokens to generate
Output Schema

Output

Type: string

Version Details
Version ID
bf57361c75677fc33d480d0c5f02926e621b2caa2000347cb74aeae9d2ca07ee
Version Created
December 20, 2024
Run on Replicate →