lucataco/qwen2-vl-7b-instruct 🖼️📝🔢 → 📝

▶️ 651.4K runs 📅 Dec 2024 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License

image-analysis image-to-text ocr video-to-text visual-understanding

Performance

6.2sTypical run time

651.4KTotal runs

About

Latest model in the Qwen family for chatting with video and image models

Example Output

Prompt:

"Describe this video in detail."

Output

The video features a monkey riding a skateboard in a park. The monkey is wearing roller skates and is seen moving forward on the skateboard, with its arms outstretched for balance. The park is filled with people sitting on benches and walking around, enjoying the sunny day. The monkey appears to be enjoying itself, and the scene is filled with a sense of fun and playfulness. The park is surrounded by trees and greenery, creating a peaceful and relaxing atmosphere. Overall, the video captures a unique and entertaining moment of a monkey enjoying a day out in the park on roller skates.

Performance Metrics

6.24s Prediction Time

6.25s Total Time

All Input Parameters

{
  "media": "https://replicate.delivery/pbxt/MB8qw19bkjGGCTr8Px17db2ydBA3xrHyxBk5g5wRSEH0in9N/q2m-LO3Xg0vO0xmw.mp4",
  "prompt": "Describe this video in detail.",
  "max_new_tokens": 128
}

Input Parameters

media (required) Type: string: Input image or video file
prompt Type: stringDefault: Describe this in detail.: Custom prompt to guide the description
max_new_tokens Type: integerDefault: 128Range: 1 - 512: Maximum number of tokens to generate

Output Schema

Output

Type: string

Version Details

Version ID: bf57361c75677fc33d480d0c5f02926e621b2caa2000347cb74aeae9d2ca07ee
Version Created: December 20, 2024

Run on Replicate →