bytedance/sa2va-4b-image 🖼️📝 → ❓

▶️ 102 runs 📅 Feb 2025 ⚙️ Cog 0.13.8-dev+gdeaa413.d20250220 🔗 GitHub 📄 Paper ⚖️ License
image-segmentation image-to-text

About

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Example Output

Output

Performance Metrics

1.07s Prediction Time
69.27s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/MXVMYKMbmDEtKqsegGtgTgmZQAhDRXydmVnt0tRA65Cr8L3H/replicate-prediction-vc34d0cgt9rme0cn57f8qqzp8m.webp",
  "instruction": "segment the snowboarder"
}
Input Parameters
image (required) Type: string
Input image for segmentation
instruction (required) Type: string
Text instruction for the model
Output Schema
img Type: stringFormat: uri
Img
response Type: string
Response
Example Execution Logs
propagate in video:   0%|          | 0/1 [00:00<?, ?it/s]
propagate in video: 100%|██████████| 1/1 [00:00<00:00, 5440.08it/s]
Version Details
Version ID
ccf8e58352ce5b2a6cbf5cb68b947ca3bc62009ce66e82071eeae14555414411
Version Created
February 22, 2025
Run on Replicate →