w95/tinyclick
Automate GUI interactions by predicting where to click from a screenshot and a natural-language command. Takes a GUI scr...
Found 5 models (showing 1-5)
Automate GUI interactions by predicting where to click from a screenshot and a natural-language command. Takes a GUI scr...
Segment objects in videos from natural-language instructions. Accepts a video and a text instruction (referring expressi...
Segment objects in images from natural-language instructions and answer grounded visual questions. Takes an image and a...
Segment objects in images from natural-language instructions and answer visual questions. Provide an image plus a text i...
Segment objects and regions in images using natural language instructions. Accepts an image and a text instruction and r...