lucataco/sam3-video 🖼️📝✓🔢 → 🖼️
About
A unified foundation model for prompt-based segmentation in images and videos
Example Output
Prompt:
"person"
Output
Performance Metrics
34.15s
Prediction Time
34.17s
Total Time
All Input Parameters
{
"video": "https://replicate.delivery/pbxt/O8DMHHE2jFAQ87PgxoouktqkBScfiYepIDPkH75QoEbYJISI/foot.mp4",
"prompt": "person",
"mask_only": false,
"mask_color": "green",
"return_zip": false,
"mask_opacity": 0.5
}
Input Parameters
- video (required)
- Input video file
- prompt
- Text prompt for segmentation
- mask_only
- If True, returns a black-and-white mask video instead of an overlay on the original video
- mask_color
- Color of the mask overlay. Options: 'green', 'red', 'blue', 'yellow', 'cyan', 'magenta'
- return_zip
- If True, returns a ZIP file containing individual frame masks as PNGs
- mask_opacity
- Opacity of the mask overlay (0.0 to 1.0)
- visual_prompt
- Optional: JSON string defining visual prompts (points/labels) or bounding boxes
- negative_prompt
- Optional: Text prompt for objects to exclude
Output Schema
Output
Example Execution Logs
Processing video: /tmp/tmpbnsq0otsfoot.mp4 Loaded 152 frames. FPS: 25.0 Adding text prompt: 'person' Running inference... 0%| | 0/152 [00:00<?, ?it/s] 1%| | 1/152 [00:00<00:16, 9.14it/s] 1%|▏ | 2/152 [00:00<00:17, 8.74it/s] 2%|▏ | 3/152 [00:00<00:17, 8.67it/s] 3%|▎ | 4/152 [00:00<00:17, 8.67it/s] 3%|▎ | 5/152 [00:00<00:19, 7.36it/s] 4%|▍ | 6/152 [00:00<00:19, 7.35it/s] 5%|▍ | 7/152 [00:00<00:20, 7.24it/s] 5%|▌ | 8/152 [00:01<00:20, 7.06it/s] 6%|▌ | 9/152 [00:01<00:20, 6.93it/s] 7%|▋ | 10/152 [00:01<00:20, 6.81it/s] 7%|▋ | 11/152 [00:01<00:20, 6.73it/s] 8%|▊ | 12/152 [00:01<00:20, 6.67it/s] 9%|▊ | 13/152 [00:01<00:24, 5.75it/s] 9%|▉ | 14/152 [00:02<00:23, 5.92it/s] 10%|▉ | 15/152 [00:02<00:22, 6.00it/s] 11%|█ | 16/152 [00:02<00:22, 6.08it/s] 11%|█ | 17/152 [00:02<00:22, 6.09it/s] 12%|█▏ | 18/152 [00:02<00:22, 6.07it/s] 12%|█▎ | 19/152 [00:02<00:21, 6.09it/s] 13%|█▎ | 20/152 [00:03<00:21, 6.09it/s] 14%|█▍ | 21/152 [00:03<00:21, 6.08it/s] 14%|█▍ | 22/152 [00:03<00:21, 6.05it/s] 15%|█▌ | 23/152 [00:03<00:21, 6.04it/s] 16%|█▌ | 24/152 [00:03<00:21, 5.98it/s] 16%|█▋ | 25/152 [00:03<00:21, 5.94it/s] 17%|█▋ | 26/152 [00:04<00:21, 5.92it/s] 18%|█▊ | 27/152 [00:04<00:21, 5.90it/s] 18%|█▊ | 28/152 [00:04<00:21, 5.88it/s] 19%|█▉ | 29/152 [00:04<00:21, 5.85it/s] 20%|█▉ | 30/152 [00:04<00:20, 5.82it/s] 20%|██ | 31/152 [00:04<00:20, 5.80it/s] 21%|██ | 32/152 [00:05<00:20, 5.81it/s] 22%|██▏ | 33/152 [00:05<00:20, 5.80it/s] 22%|██▏ | 34/152 [00:05<00:20, 5.80it/s] 23%|██▎ | 35/152 [00:05<00:20, 5.80it/s] 24%|██▎ | 36/152 [00:05<00:20, 5.78it/s] 24%|██▍ | 37/152 [00:05<00:19, 5.81it/s] 25%|██▌ | 38/152 [00:06<00:19, 5.81it/s] 26%|██▌ | 39/152 [00:06<00:19, 5.82it/s] 26%|██▋ | 40/152 [00:06<00:19, 5.75it/s] 27%|██▋ | 41/152 [00:06<00:19, 5.69it/s] 28%|██▊ | 42/152 [00:06<00:19, 5.67it/s] 28%|██▊ | 43/152 [00:06<00:19, 5.63it/s] 29%|██▉ | 44/152 [00:07<00:19, 5.61it/s] 30%|██▉ | 45/152 [00:07<00:19, 5.59it/s] 30%|███ | 46/152 [00:07<00:19, 5.57it/s] 31%|███ | 47/152 [00:07<00:18, 5.56it/s] 32%|███▏ | 48/152 [00:07<00:18, 5.55it/s] 32%|███▏ | 49/152 [00:08<00:18, 5.54it/s] 33%|███▎ | 50/152 [00:08<00:18, 5.53it/s] 34%|███▎ | 51/152 [00:08<00:18, 5.55it/s] 34%|███▍ | 52/152 [00:08<00:18, 5.55it/s] 35%|███▍ | 53/152 [00:08<00:17, 5.55it/s] 36%|███▌ | 54/152 [00:08<00:17, 5.56it/s] 36%|███▌ | 55/152 [00:09<00:17, 5.56it/s] 37%|███▋ | 56/152 [00:09<00:17, 5.51it/s] 38%|███▊ | 57/152 [00:09<00:17, 5.48it/s] 38%|███▊ | 58/152 [00:09<00:17, 5.46it/s] 39%|███▉ | 59/152 [00:09<00:17, 5.43it/s] 39%|███▉ | 60/152 [00:10<00:16, 5.42it/s] 40%|████ | 61/152 [00:10<00:16, 5.41it/s] 41%|████ | 62/152 [00:10<00:16, 5.40it/s] 41%|████▏ | 63/152 [00:10<00:16, 5.39it/s] 42%|████▏ | 64/152 [00:10<00:16, 5.40it/s] 43%|████▎ | 65/152 [00:11<00:16, 5.39it/s] 43%|████▎ | 66/152 [00:11<00:15, 5.43it/s] 44%|████▍ | 67/152 [00:11<00:15, 5.47it/s] 45%|████▍ | 68/152 [00:11<00:15, 5.49it/s] 45%|████▌ | 69/152 [00:11<00:15, 5.50it/s] 46%|████▌ | 70/152 [00:11<00:14, 5.51it/s] 47%|████▋ | 71/152 [00:12<00:14, 5.52it/s] 47%|████▋ | 72/152 [00:12<00:14, 5.47it/s] 48%|████▊ | 73/152 [00:12<00:14, 5.42it/s] 49%|████▊ | 74/152 [00:12<00:14, 5.39it/s] 49%|████▉ | 75/152 [00:12<00:14, 5.37it/s] 50%|█████ | 76/152 [00:13<00:14, 5.36it/s] 51%|█████ | 77/152 [00:13<00:14, 5.35it/s] 51%|█████▏ | 78/152 [00:13<00:13, 5.33it/s] 52%|█████▏ | 79/152 [00:13<00:13, 5.32it/s] 53%|█████▎ | 80/152 [00:13<00:13, 5.33it/s] 53%|█████▎ | 81/152 [00:13<00:13, 5.33it/s] 54%|█████▍ | 82/152 [00:14<00:13, 5.37it/s] 55%|█████▍ | 83/152 [00:14<00:12, 5.42it/s] 55%|█████▌ | 84/152 [00:14<00:12, 5.44it/s] 56%|█████▌ | 85/152 [00:14<00:12, 5.45it/s] 57%|█████▋ | 86/152 [00:14<00:12, 5.45it/s] 57%|█████▋ | 87/152 [00:15<00:11, 5.45it/s] 58%|█████▊ | 88/152 [00:15<00:11, 5.37it/s] 59%|█████▊ | 89/152 [00:15<00:11, 5.32it/s] 59%|█████▉ | 90/152 [00:15<00:11, 5.28it/s] 60%|█████▉ | 91/152 [00:15<00:11, 5.25it/s] 61%|██████ | 92/152 [00:16<00:11, 5.24it/s] 61%|██████ | 93/152 [00:16<00:11, 5.24it/s] 62%|██████▏ | 94/152 [00:16<00:11, 5.23it/s] 62%|██████▎ | 95/152 [00:16<00:10, 5.23it/s] 63%|██████▎ | 96/152 [00:16<00:10, 5.20it/s] 64%|██████▍ | 97/152 [00:16<00:10, 5.20it/s] 64%|██████▍ | 98/152 [00:17<00:10, 5.24it/s] 65%|██████▌ | 99/152 [00:17<00:10, 5.27it/s] 66%|██████▌ | 100/152 [00:17<00:09, 5.25it/s] 66%|██████▋ | 101/152 [00:17<00:09, 5.26it/s] 67%|██████▋ | 102/152 [00:17<00:09, 5.27it/s] 68%|██████▊ | 103/152 [00:18<00:09, 5.30it/s] 68%|██████▊ | 104/152 [00:18<00:09, 5.26it/s] 69%|██████▉ | 105/152 [00:18<00:08, 5.23it/s] 70%|██████▉ | 106/152 [00:18<00:08, 5.20it/s] 70%|███████ | 107/152 [00:18<00:08, 5.18it/s] 71%|███████ | 108/152 [00:19<00:08, 5.14it/s] 72%|███████▏ | 109/152 [00:19<00:08, 5.13it/s] 72%|███████▏ | 110/152 [00:19<00:08, 5.13it/s] 73%|███████▎ | 111/152 [00:19<00:07, 5.13it/s] 74%|███████▎ | 112/152 [00:19<00:07, 5.13it/s] 74%|███████▍ | 113/152 [00:20<00:07, 5.12it/s] 75%|███████▌ | 114/152 [00:20<00:07, 5.17it/s] 76%|███████▌ | 115/152 [00:20<00:07, 5.19it/s] 76%|███████▋ | 116/152 [00:20<00:06, 5.19it/s] 77%|███████▋ | 117/152 [00:20<00:06, 5.23it/s] 78%|███████▊ | 118/152 [00:21<00:06, 5.23it/s] 78%|███████▊ | 119/152 [00:21<00:06, 5.24it/s] 79%|███████▉ | 120/152 [00:21<00:06, 5.19it/s] 80%|███████▉ | 121/152 [00:21<00:06, 5.16it/s] 80%|████████ | 122/152 [00:21<00:05, 5.13it/s] 81%|████████ | 123/152 [00:22<00:05, 5.13it/s] 82%|████████▏ | 124/152 [00:22<00:05, 5.12it/s] 82%|████████▏ | 125/152 [00:22<00:05, 5.11it/s] 83%|████████▎ | 126/152 [00:22<00:06, 4.33it/s] 84%|████████▎ | 127/152 [00:22<00:05, 4.49it/s] 84%|████████▍ | 128/152 [00:23<00:05, 4.62it/s] 85%|████████▍ | 129/152 [00:23<00:04, 4.71it/s] 86%|████████▌ | 130/152 [00:23<00:04, 4.81it/s] 86%|████████▌ | 131/152 [00:23<00:04, 4.86it/s] 87%|████████▋ | 132/152 [00:23<00:04, 4.91it/s] 88%|████████▊ | 133/152 [00:24<00:03, 4.93it/s] 88%|████████▊ | 134/152 [00:24<00:03, 4.93it/s] 89%|████████▉ | 135/152 [00:24<00:03, 4.93it/s] 89%|████████▉ | 136/152 [00:24<00:03, 4.87it/s] 90%|█████████ | 137/152 [00:24<00:03, 4.83it/s] 91%|█████████ | 138/152 [00:25<00:02, 4.81it/s] 91%|█████████▏| 139/152 [00:25<00:02, 4.79it/s] 92%|█████████▏| 140/152 [00:25<00:02, 4.77it/s] 93%|█████████▎| 141/152 [00:25<00:02, 4.77it/s] 93%|█████████▎| 142/152 [00:26<00:02, 4.76it/s] 94%|█████████▍| 143/152 [00:26<00:01, 4.76it/s] 95%|█████████▍| 144/152 [00:26<00:01, 4.76it/s] 95%|█████████▌| 145/152 [00:26<00:01, 4.02it/s] 96%|█████████▌| 146/152 [00:26<00:01, 4.23it/s] 97%|█████████▋| 147/152 [00:27<00:01, 4.41it/s] 97%|█████████▋| 148/152 [00:27<00:00, 4.54it/s] 98%|█████████▊| 149/152 [00:27<00:00, 4.64it/s] 99%|█████████▊| 150/152 [00:27<00:00, 4.71it/s] 99%|█████████▉| 151/152 [00:27<00:00, 4.75it/s] 100%|██████████| 152/152 [00:28<00:00, 4.60it/s] 100%|██████████| 152/152 [00:28<00:00, 5.38it/s] Saving output video to /tmp/output.mp4... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Video saved.
Version Details
- Version ID
8cbab4c2a3133e679b5b863b80527f6b5c751ec7b33681b7e0b7c79c749df961- Version Created
- November 26, 2025