lucataco/videollama3-7b 🔢🖼️📝 → 📝

▶️ 8.2K runs 📅 Feb 2025 ⚙️ Cog 0.13.7 🔗 GitHub 📄 Paper ⚖️ License
video-analysis video-auto-captioning video-question-answering video-to-text

About

VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding

Example Output

Prompt:

"What is unusual in the video?"

Output

The unusual aspect of the video is that it shows bears, which are typically wild animals, engaging in a human-like activity such as eating sushi at a dining table.

Performance Metrics

4.01s Prediction Time
27.97s Total Time
All Input Parameters
{
  "fps": 1,
  "top_p": 0.9,
  "video": "https://replicate.delivery/pbxt/MV1tNGskZ6lDM0iDmHelOin3dAvOmsbSGQUW6KYhhwKiQMUT/bear.mp4",
  "prompt": "What is unusual in the video?",
  "max_frames": 180,
  "temperature": 0.2,
  "max_new_tokens": 2048
}
Input Parameters
fps Type: numberDefault: 1Range: 0 - 10
Frames per second to sample from video
top_p Type: numberDefault: 0.9Range: 0 - 1
Top-p sampling
video (required) Type: string
Input video file
prompt (required) Type: string
Text prompt to guide the model's response
max_frames Type: integerDefault: 180Range: 0 - 256
Maximum number of frames to process
temperature Type: numberDefault: 0.2Range: 0 - 1
Sampling temperature
max_new_tokens Type: integerDefault: 2048Range: 0 - 4096
Maximum number of tokens to generate
Output Schema

Output

Type: string

Version Details
Version ID
34a1f45f7068f7121a5b47c91f2d7e06c298850767f76f96660450a0a3bd5bbe
Version Created
February 14, 2025
Run on Replicate →