meta/sam-2-video
Segment objects in videos from interactive point prompts, labels, and object IDs, outputting per-frame masks as a video...
Found 6 models (showing 1-6)
Segment objects in videos from interactive point prompts, labels, and object IDs, outputting per-frame masks as a video...
Segment objects in videos from natural-language instructions. Takes a video and a text instruction (referring expression...
Segment objects in video from a natural-language instruction. Provide a video and a text prompt and receive a masked vid...
Track multiple objects across video, associating detections over a long temporal window (up to 32 frames) to produce glo...
Track people in video and return an annotated video with bounding boxes and persistent track IDs using person detection...
Detect active speakers in a video. Input a video and get JSON metadata and optional annotated videos that localize which...