lucataco/videollama3-7b 🔢🖼️📝 → 📝
About
VideoLLaMA 3: Frontier Multimodal Foundation Models for Video Understanding

Example Output
Prompt:
"What is unusual in the video?"
Output
The unusual aspect of the video is that it shows bears, which are typically wild animals, engaging in a human-like activity such as eating sushi at a dining table.
Performance Metrics
4.01s
Prediction Time
27.97s
Total Time
All Input Parameters
{ "fps": 1, "top_p": 0.9, "video": "https://replicate.delivery/pbxt/MV1tNGskZ6lDM0iDmHelOin3dAvOmsbSGQUW6KYhhwKiQMUT/bear.mp4", "prompt": "What is unusual in the video?", "max_frames": 180, "temperature": 0.2, "max_new_tokens": 2048 }
Input Parameters
- fps
- Frames per second to sample from video
- top_p
- Top-p sampling
- video (required)
- Input video file
- prompt (required)
- Text prompt to guide the model's response
- max_frames
- Maximum number of frames to process
- temperature
- Sampling temperature
- max_new_tokens
- Maximum number of tokens to generate
Output Schema
Output
Version Details
- Version ID
34a1f45f7068f7121a5b47c91f2d7e06c298850767f76f96660450a0a3bd5bbe
- Version Created
- February 14, 2025