n1jl0091/video-llava-7b-hf_replicate_n1jl0091 ๐ข๐ผ๏ธ๐ โ ๐
About
Upload an image or video, and Video-LLaVa will give you a text description of what it "sees."
Example Output
Output
In this video, a woman is standing in a kitchen and preparing food.ะช
Performance Metrics
4.07s
Prediction Time
131.50s
Total Time
All Input Parameters
{
"top_p": 0.9,
"videos": [
"https://replicate.delivery/pbxt/Lzl3gqYd6ExXDlkvvpAwtQWhWzIOtCiYW1ztjoHvaVVFNEzt/3325978-hd_1920_1080_24fps.mp4"
],
"prompts": [
"What is happening in this video?"
],
"num_frames": 10,
"temperature": 0.1,
"max_new_tokens": 500
}
Input Parameters
- top_p
- videos (required)
- prompts (required)
- num_frames
- temperature
- max_new_tokens
Output Schema
Output
Example Execution Logs
Starting prediction for /tmp/tmpt59elgyh3325978-hd_1920_1080_24fps.mp4 at 18:03:50
Using 10 frames
Expanding inputs for image tokens in Video-LLaVa should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.44.
Expanding inputs for image tokens in Video-LLaVa should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.
Total prediction time for /tmp/tmpt59elgyh3325978-hd_1920_1080_24fps.mp4: 3.42s
Version Details
- Version ID
ff284eb7daa7ace568fe353efecc4728c1f1844771462d7ec3b4844741270ddf- Version Created
- November 18, 2024