bytedance/sa2va-8b-image 🖼️📝 → ❓

▶️ 48.4K runs 📅 Feb 2025 ⚙️ Cog 0.13.8-dev+gdeaa413.d20250220 🔗 GitHub 📄 Paper ⚖️ License

image-segmentation image-to-text visual-grounding visual-question-answering visual-understanding

Performance

0.9sTypical run time

48.4KTotal runs

About

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Example Output

Output

{"img":"https://replicate.delivery/xezq/94rYVsLtK0IfLiKPefc5zxeeXhFcGf0acrzUNY6uVSfVLa4IKA/output.png","response":"Sure, [SEG] .<|im_end|>"}

Performance Metrics

0.87s Prediction Time

0.88s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/MXdtc5yJDPoUGs6li6sYevHiNXWJjaD9O4kvCwYIAIWTHWsG/replicate-prediction-1spvj2jc8hrm80cn5f6t1xxg4m.webp",
  "instruction": "segment the giraffe"
}

Input Parameters

image (required) Type: string: Input image for segmentation
instruction (required) Type: string: Text instruction for the model

Output Schema

img Type: stringFormat: uri: Img
response Type: string: Response

Example Execution Logs

propagate in video:   0%|          | 0/1 [00:00<?, ?it/s]
propagate in video: 100%|██████████| 1/1 [00:00<00:00, 5599.87it/s]

Version Details

Version ID: 956baf05a8a81ab47f1d0dac8eab6585b899790f342975a964840c4e9c63c7aa
Version Created: February 22, 2025

Run on Replicate →