🤖 Model 🎥 bytedance/sa2va-4b-video Segment objects in videos from natural-language instructions. Accepts a video and a text instruction (referring expressi... 🎥 • video-segmentation • video-grounding • 48 runs