lucataco/videollama3-7b 🔢🖼️📝 → 📝

▶️ 34.5K runs 📅 Feb 2025 ⚙️ Cog 0.13.7 🔗 GitHub 📄 Paper ⚖️ License

video-analysis video-auto-captioning video-question-answering video-to-text

About

VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding

Example Output

Prompt:

"What is unusual in the video?"

Output

The unusual aspect of the video is that it shows bears, which are typically wild animals, engaging in a human-like activity such as eating sushi at a dining table.

Performance Metrics

4.01s Prediction Time

27.97s Total Time

All Input Parameters

{
  "fps": 1,
  "top_p": 0.9,
  "video": "https://replicate.delivery/pbxt/MV1tNGskZ6lDM0iDmHelOin3dAvOmsbSGQUW6KYhhwKiQMUT/bear.mp4",
  "prompt": "What is unusual in the video?",
  "max_frames": 180,
  "temperature": 0.2,
  "max_new_tokens": 2048
}

Input Parameters

fps Type: numberDefault: 1Range: 0 - 10: Frames per second to sample from video
top_p Type: numberDefault: 0.9Range: 0 - 1: Top-p sampling
video (required) Type: string: Input video file
prompt (required) Type: string: Text prompt to guide the model's response
max_frames Type: integerDefault: 180Range: 0 - 256: Maximum number of frames to process
temperature Type: numberDefault: 0.2Range: 0 - 1: Sampling temperature
max_new_tokens Type: integerDefault: 2048Range: 0 - 4096: Maximum number of tokens to generate

Output Schema

Output

Type: string

Version Details

Version ID: 34a1f45f7068f7121a5b47c91f2d7e06c298850767f76f96660450a0a3bd5bbe
Version Created: February 14, 2025

Run on Replicate →