n1jl0091/video-llava-7b-hf_replicate_n1jl0091 ๐ข๐ผ๏ธ๐ โ ๐
About
Upload an image or video, and Video-LLaVa will give you a text description of what it "sees."

Example Output
Output
In this video, a woman is standing in a kitchen and preparing food.ะช
Performance Metrics
4.07s
Prediction Time
131.50s
Total Time
All Input Parameters
{ "top_p": 0.9, "videos": [ "https://replicate.delivery/pbxt/Lzl3gqYd6ExXDlkvvpAwtQWhWzIOtCiYW1ztjoHvaVVFNEzt/3325978-hd_1920_1080_24fps.mp4" ], "prompts": [ "What is happening in this video?" ], "num_frames": 10, "temperature": 0.1, "max_new_tokens": 500 }
Input Parameters
- top_p
- videos (required)
- prompts (required)
- num_frames
- temperature
- max_new_tokens
Output Schema
Output
Example Execution Logs
Starting prediction for /tmp/tmpt59elgyh3325978-hd_1920_1080_24fps.mp4 at 18:03:50 Using 10 frames Expanding inputs for image tokens in Video-LLaVa should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.44. Expanding inputs for image tokens in Video-LLaVa should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47. Total prediction time for /tmp/tmpt59elgyh3325978-hd_1920_1080_24fps.mp4: 3.42s
Version Details
- Version ID
ff284eb7daa7ace568fe353efecc4728c1f1844771462d7ec3b4844741270ddf
- Version Created
- November 18, 2024