lucataco/sam3-video 🖼️📝✓🔢 → 🖼️

▶️ 3.3K runs 📅 Nov 2025 ⚙️ Cog 0.16.9 🔗 GitHub
video-segmentation

About

A unified foundation model for prompt-based segmentation in images and videos

Example Output

Prompt:

"person"

Output

Performance Metrics

34.15s Prediction Time
34.17s Total Time
All Input Parameters
{
  "video": "https://replicate.delivery/pbxt/O8DMHHE2jFAQ87PgxoouktqkBScfiYepIDPkH75QoEbYJISI/foot.mp4",
  "prompt": "person",
  "mask_only": false,
  "mask_color": "green",
  "return_zip": false,
  "mask_opacity": 0.5
}
Input Parameters
video (required) Type: string
Input video file
prompt Type: stringDefault: person
Text prompt for segmentation
mask_only Type: booleanDefault: false
If True, returns a black-and-white mask video instead of an overlay on the original video
mask_color Type: stringDefault: green
Color of the mask overlay. Options: 'green', 'red', 'blue', 'yellow', 'cyan', 'magenta'
return_zip Type: booleanDefault: false
If True, returns a ZIP file containing individual frame masks as PNGs
mask_opacity Type: numberDefault: 0.5Range: 0 - 1
Opacity of the mask overlay (0.0 to 1.0)
visual_prompt Type: string
Optional: JSON string defining visual prompts (points/labels) or bounding boxes
negative_prompt Type: string
Optional: Text prompt for objects to exclude
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Processing video: /tmp/tmpbnsq0otsfoot.mp4
Loaded 152 frames. FPS: 25.0
Adding text prompt: 'person'
Running inference...
  0%|          | 0/152 [00:00<?, ?it/s]
  1%|          | 1/152 [00:00<00:16,  9.14it/s]
  1%|▏         | 2/152 [00:00<00:17,  8.74it/s]
  2%|▏         | 3/152 [00:00<00:17,  8.67it/s]
  3%|▎         | 4/152 [00:00<00:17,  8.67it/s]
  3%|▎         | 5/152 [00:00<00:19,  7.36it/s]
  4%|▍         | 6/152 [00:00<00:19,  7.35it/s]
  5%|▍         | 7/152 [00:00<00:20,  7.24it/s]
  5%|▌         | 8/152 [00:01<00:20,  7.06it/s]
  6%|▌         | 9/152 [00:01<00:20,  6.93it/s]
  7%|▋         | 10/152 [00:01<00:20,  6.81it/s]
  7%|▋         | 11/152 [00:01<00:20,  6.73it/s]
  8%|▊         | 12/152 [00:01<00:20,  6.67it/s]
  9%|▊         | 13/152 [00:01<00:24,  5.75it/s]
  9%|▉         | 14/152 [00:02<00:23,  5.92it/s]
 10%|▉         | 15/152 [00:02<00:22,  6.00it/s]
 11%|█         | 16/152 [00:02<00:22,  6.08it/s]
 11%|█         | 17/152 [00:02<00:22,  6.09it/s]
 12%|█▏        | 18/152 [00:02<00:22,  6.07it/s]
 12%|█▎        | 19/152 [00:02<00:21,  6.09it/s]
 13%|█▎        | 20/152 [00:03<00:21,  6.09it/s]
 14%|█▍        | 21/152 [00:03<00:21,  6.08it/s]
 14%|█▍        | 22/152 [00:03<00:21,  6.05it/s]
 15%|█▌        | 23/152 [00:03<00:21,  6.04it/s]
 16%|█▌        | 24/152 [00:03<00:21,  5.98it/s]
 16%|█▋        | 25/152 [00:03<00:21,  5.94it/s]
 17%|█▋        | 26/152 [00:04<00:21,  5.92it/s]
 18%|█▊        | 27/152 [00:04<00:21,  5.90it/s]
 18%|█▊        | 28/152 [00:04<00:21,  5.88it/s]
 19%|█▉        | 29/152 [00:04<00:21,  5.85it/s]
 20%|█▉        | 30/152 [00:04<00:20,  5.82it/s]
 20%|██        | 31/152 [00:04<00:20,  5.80it/s]
 21%|██        | 32/152 [00:05<00:20,  5.81it/s]
 22%|██▏       | 33/152 [00:05<00:20,  5.80it/s]
 22%|██▏       | 34/152 [00:05<00:20,  5.80it/s]
 23%|██▎       | 35/152 [00:05<00:20,  5.80it/s]
 24%|██▎       | 36/152 [00:05<00:20,  5.78it/s]
 24%|██▍       | 37/152 [00:05<00:19,  5.81it/s]
 25%|██▌       | 38/152 [00:06<00:19,  5.81it/s]
 26%|██▌       | 39/152 [00:06<00:19,  5.82it/s]
 26%|██▋       | 40/152 [00:06<00:19,  5.75it/s]
 27%|██▋       | 41/152 [00:06<00:19,  5.69it/s]
 28%|██▊       | 42/152 [00:06<00:19,  5.67it/s]
 28%|██▊       | 43/152 [00:06<00:19,  5.63it/s]
 29%|██▉       | 44/152 [00:07<00:19,  5.61it/s]
 30%|██▉       | 45/152 [00:07<00:19,  5.59it/s]
 30%|███       | 46/152 [00:07<00:19,  5.57it/s]
 31%|███       | 47/152 [00:07<00:18,  5.56it/s]
 32%|███▏      | 48/152 [00:07<00:18,  5.55it/s]
 32%|███▏      | 49/152 [00:08<00:18,  5.54it/s]
 33%|███▎      | 50/152 [00:08<00:18,  5.53it/s]
 34%|███▎      | 51/152 [00:08<00:18,  5.55it/s]
 34%|███▍      | 52/152 [00:08<00:18,  5.55it/s]
 35%|███▍      | 53/152 [00:08<00:17,  5.55it/s]
 36%|███▌      | 54/152 [00:08<00:17,  5.56it/s]
 36%|███▌      | 55/152 [00:09<00:17,  5.56it/s]
 37%|███▋      | 56/152 [00:09<00:17,  5.51it/s]
 38%|███▊      | 57/152 [00:09<00:17,  5.48it/s]
 38%|███▊      | 58/152 [00:09<00:17,  5.46it/s]
 39%|███▉      | 59/152 [00:09<00:17,  5.43it/s]
 39%|███▉      | 60/152 [00:10<00:16,  5.42it/s]
 40%|████      | 61/152 [00:10<00:16,  5.41it/s]
 41%|████      | 62/152 [00:10<00:16,  5.40it/s]
 41%|████▏     | 63/152 [00:10<00:16,  5.39it/s]
 42%|████▏     | 64/152 [00:10<00:16,  5.40it/s]
 43%|████▎     | 65/152 [00:11<00:16,  5.39it/s]
 43%|████▎     | 66/152 [00:11<00:15,  5.43it/s]
 44%|████▍     | 67/152 [00:11<00:15,  5.47it/s]
 45%|████▍     | 68/152 [00:11<00:15,  5.49it/s]
 45%|████▌     | 69/152 [00:11<00:15,  5.50it/s]
 46%|████▌     | 70/152 [00:11<00:14,  5.51it/s]
 47%|████▋     | 71/152 [00:12<00:14,  5.52it/s]
 47%|████▋     | 72/152 [00:12<00:14,  5.47it/s]
 48%|████▊     | 73/152 [00:12<00:14,  5.42it/s]
 49%|████▊     | 74/152 [00:12<00:14,  5.39it/s]
 49%|████▉     | 75/152 [00:12<00:14,  5.37it/s]
 50%|█████     | 76/152 [00:13<00:14,  5.36it/s]
 51%|█████     | 77/152 [00:13<00:14,  5.35it/s]
 51%|█████▏    | 78/152 [00:13<00:13,  5.33it/s]
 52%|█████▏    | 79/152 [00:13<00:13,  5.32it/s]
 53%|█████▎    | 80/152 [00:13<00:13,  5.33it/s]
 53%|█████▎    | 81/152 [00:13<00:13,  5.33it/s]
 54%|█████▍    | 82/152 [00:14<00:13,  5.37it/s]
 55%|█████▍    | 83/152 [00:14<00:12,  5.42it/s]
 55%|█████▌    | 84/152 [00:14<00:12,  5.44it/s]
 56%|█████▌    | 85/152 [00:14<00:12,  5.45it/s]
 57%|█████▋    | 86/152 [00:14<00:12,  5.45it/s]
 57%|█████▋    | 87/152 [00:15<00:11,  5.45it/s]
 58%|█████▊    | 88/152 [00:15<00:11,  5.37it/s]
 59%|█████▊    | 89/152 [00:15<00:11,  5.32it/s]
 59%|█████▉    | 90/152 [00:15<00:11,  5.28it/s]
 60%|█████▉    | 91/152 [00:15<00:11,  5.25it/s]
 61%|██████    | 92/152 [00:16<00:11,  5.24it/s]
 61%|██████    | 93/152 [00:16<00:11,  5.24it/s]
 62%|██████▏   | 94/152 [00:16<00:11,  5.23it/s]
 62%|██████▎   | 95/152 [00:16<00:10,  5.23it/s]
 63%|██████▎   | 96/152 [00:16<00:10,  5.20it/s]
 64%|██████▍   | 97/152 [00:16<00:10,  5.20it/s]
 64%|██████▍   | 98/152 [00:17<00:10,  5.24it/s]
 65%|██████▌   | 99/152 [00:17<00:10,  5.27it/s]
 66%|██████▌   | 100/152 [00:17<00:09,  5.25it/s]
 66%|██████▋   | 101/152 [00:17<00:09,  5.26it/s]
 67%|██████▋   | 102/152 [00:17<00:09,  5.27it/s]
 68%|██████▊   | 103/152 [00:18<00:09,  5.30it/s]
 68%|██████▊   | 104/152 [00:18<00:09,  5.26it/s]
 69%|██████▉   | 105/152 [00:18<00:08,  5.23it/s]
 70%|██████▉   | 106/152 [00:18<00:08,  5.20it/s]
 70%|███████   | 107/152 [00:18<00:08,  5.18it/s]
 71%|███████   | 108/152 [00:19<00:08,  5.14it/s]
 72%|███████▏  | 109/152 [00:19<00:08,  5.13it/s]
 72%|███████▏  | 110/152 [00:19<00:08,  5.13it/s]
 73%|███████▎  | 111/152 [00:19<00:07,  5.13it/s]
 74%|███████▎  | 112/152 [00:19<00:07,  5.13it/s]
 74%|███████▍  | 113/152 [00:20<00:07,  5.12it/s]
 75%|███████▌  | 114/152 [00:20<00:07,  5.17it/s]
 76%|███████▌  | 115/152 [00:20<00:07,  5.19it/s]
 76%|███████▋  | 116/152 [00:20<00:06,  5.19it/s]
 77%|███████▋  | 117/152 [00:20<00:06,  5.23it/s]
 78%|███████▊  | 118/152 [00:21<00:06,  5.23it/s]
 78%|███████▊  | 119/152 [00:21<00:06,  5.24it/s]
 79%|███████▉  | 120/152 [00:21<00:06,  5.19it/s]
 80%|███████▉  | 121/152 [00:21<00:06,  5.16it/s]
 80%|████████  | 122/152 [00:21<00:05,  5.13it/s]
 81%|████████  | 123/152 [00:22<00:05,  5.13it/s]
 82%|████████▏ | 124/152 [00:22<00:05,  5.12it/s]
 82%|████████▏ | 125/152 [00:22<00:05,  5.11it/s]
 83%|████████▎ | 126/152 [00:22<00:06,  4.33it/s]
 84%|████████▎ | 127/152 [00:22<00:05,  4.49it/s]
 84%|████████▍ | 128/152 [00:23<00:05,  4.62it/s]
 85%|████████▍ | 129/152 [00:23<00:04,  4.71it/s]
 86%|████████▌ | 130/152 [00:23<00:04,  4.81it/s]
 86%|████████▌ | 131/152 [00:23<00:04,  4.86it/s]
 87%|████████▋ | 132/152 [00:23<00:04,  4.91it/s]
 88%|████████▊ | 133/152 [00:24<00:03,  4.93it/s]
 88%|████████▊ | 134/152 [00:24<00:03,  4.93it/s]
 89%|████████▉ | 135/152 [00:24<00:03,  4.93it/s]
 89%|████████▉ | 136/152 [00:24<00:03,  4.87it/s]
 90%|█████████ | 137/152 [00:24<00:03,  4.83it/s]
 91%|█████████ | 138/152 [00:25<00:02,  4.81it/s]
 91%|█████████▏| 139/152 [00:25<00:02,  4.79it/s]
 92%|█████████▏| 140/152 [00:25<00:02,  4.77it/s]
 93%|█████████▎| 141/152 [00:25<00:02,  4.77it/s]
 93%|█████████▎| 142/152 [00:26<00:02,  4.76it/s]
 94%|█████████▍| 143/152 [00:26<00:01,  4.76it/s]
 95%|█████████▍| 144/152 [00:26<00:01,  4.76it/s]
 95%|█████████▌| 145/152 [00:26<00:01,  4.02it/s]
 96%|█████████▌| 146/152 [00:26<00:01,  4.23it/s]
 97%|█████████▋| 147/152 [00:27<00:01,  4.41it/s]
 97%|█████████▋| 148/152 [00:27<00:00,  4.54it/s]
 98%|█████████▊| 149/152 [00:27<00:00,  4.64it/s]
 99%|█████████▊| 150/152 [00:27<00:00,  4.71it/s]
 99%|█████████▉| 151/152 [00:27<00:00,  4.75it/s]
100%|██████████| 152/152 [00:28<00:00,  4.60it/s]
100%|██████████| 152/152 [00:28<00:00,  5.38it/s]
Saving output video to /tmp/output.mp4...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Video saved.
Version Details
Version ID
8cbab4c2a3133e679b5b863b80527f6b5c751ec7b33681b7e0b7c79c749df961
Version Created
November 26, 2025
Run on Replicate →