visual-grounding AI Models

Automate GUI interactions by predicting where to click from a screenshot and a natural-language command. Takes a GUI scr...

gui-automation • visual-grounding • 28 runs

Segment objects in videos from natural-language instructions. Accepts a video and a text instruction (referring expressi...

🎥 • video-segmentation • video-grounding • 48 runs

Analyzes images with text instructions to provide visual understanding and object segmentation. Combines SAM2 segmentati...

🖼️ → 📝 • image-to-text • image-segmentation • visual-understanding • 48.3K runs

Segment objects in images from natural-language instructions and answer visual questions. Provide an image plus a text i...

🖼️ → 📝 • image-segmentation • image-to-text • 6.4K runs

Segment objects and regions in images using natural language instructions. Accepts an image and a text instruction and r...

🖼️ • image-segmentation • visual-grounding • referring-segmentation • 132 runs