🤖 Model 🎥 bytedance/sa2va-4b-video Segment objects in a video from natural-language instructions. Takes a video and a text prompt (referring expression) an... 🎥 • video-segmentation • visual-grounding • 43 runs