mattsays/sam3-image 🖼️📝✓🔢 → 🖼️
About
A unified foundation model for prompt-based segmentation in images and videos
Example Output
Prompt:
"clothes"
Output
Performance Metrics
1.42s
Prediction Time
68.72s
Total Time
All Input Parameters
{
"image": "https://replicate.delivery/pbxt/OLiTVqA6MghjGm2H0q3EZMCQSxn63ZZl5rKlCy719bEHhMsD/pexels-godisable-jacob-226636-794064.jpg",
"prompt": "clothes",
"mask_only": false,
"threshold": 0.5,
"mask_color": "green",
"return_zip": true,
"mask_opacity": 0.5,
"save_overlay": false
}
Input Parameters
- image (required)
- Input image file
- prompt
- Text prompt for segmentation
- mask_only
- If True, returns a black-and-white mask image instead of an overlay on the original image
- threshold
- Confidence threshold for object detection
- mask_color
- Color of the mask overlay. Options: 'green', 'red', 'blue', 'yellow', 'cyan', 'magenta'
- return_zip
- If True, returns a ZIP file containing individual masks as PNGs
- mask_opacity
- Opacity of the mask overlay (0.0 to 1.0)
- save_overlay
- If True, includes the overlay image in the ZIP file
Output Schema
Output
Example Execution Logs
Processing image: /tmp/tmpvc4tkcugpexels-godisable-jacob-226636-794064.jpg Adding text prompt: 'clothes' Input keys: ['pixel_values', 'original_sizes', 'input_ids', 'attention_mask'] pixel_values: shape=torch.Size([1, 3, 1008, 1008]), dtype=torch.float32 original_sizes: shape=torch.Size([1, 2]), dtype=torch.int64 input_ids: shape=torch.Size([1, 32]), dtype=torch.int64 attention_mask: shape=torch.Size([1, 32]), dtype=torch.int64 Running inference on cuda with torch.bfloat16... Inference complete! Found 2 objects
Version Details
- Version ID
d73db077226443ba4fafd34e233b3626b552eac2a433f90c7c32a9ac89bd9e72- Version Created
- January 3, 2026