aodianyun/qwen2-vl-7b 🖼️🔢📝 → 📝
About
Example Output
Prompt:
"Describe the video."
Output
The video features a woman standing behind a podium, speaking to an audience while displaying slides on a screen in front of her. The slides contain text and images related to the topic being discussed by the speaker. The woman appears to be giving a presentation or lecture on a specific subject matter. The slides provide additional information and visual aids to support the speaker's points. The setting suggests that this is likely taking place in a formal environment such as a conference room or auditorium. Overall, the video captures a professional presentation with a focus on delivering informative content through both verbal communication and visual aids.
Performance Metrics
9.51s
Prediction Time
346.45s
Total Time
All Input Parameters
{
"video": "https://replicate.delivery/pbxt/LXVISWYD8Od0I7w6EW5VIO3sycOIcukn6H26wrkaOX95RK7E/dod_classification_training.mp4",
"width": 128,
"height": 128,
"prompt": "Describe the video.",
"max_tokens": 128,
"temperature": 0.7,
"max_duration": 60,
"repetition_penalty": 1.1
}
Input Parameters
- video (required)
- Video to process
- width
- Width for the video
- height
- Height for the video
- prompt
- Prompt to use for the video
- max_tokens
- Maximum number of tokens to generate
- temperature
- Temperature for the model (0.7 is a good default).
- max_duration
- Maximum duration of the video in seconds (above 360, may run out of VRAM).
- repetition_penalty
- Repetition penalty for the model (1.1 is a good default).
Output Schema
Output
Version Details
- Version ID
9d135eb0b6b548c43fe0d5974abb7cb3c081d4a50de82d00406cc52b5a1954bc- Version Created
- September 7, 2024