
nomagick/qwen-vl-chat
Analyze images in conversational chat to answer questions, caption scenes, and localize objects with bounding boxes. Acc...
Found 19 models (showing 1-19)
Analyze images in conversational chat to answer questions, caption scenes, and localize objects with bounding boxes. Acc...
Detect objects in images from natural-language queries. Accepts an image and comma-separated text queries (category name...
Detect objects in an input image and return the image annotated with bounding boxes, class labels, and optional confiden...
Detect and localize vehicle damage from an input image, returning an annotated image that highlights affected areas for...
Detect user interface components in screenshots. Accepts an input image of a screen or app UI and returns an annotated i...
Detect UI layout sections and components in screenshots. Takes an input image of a webpage or app interface and returns...
Parse UI screenshots into structured UI elements for agentic automation. Accepts an image of a desktop, mobile, or web i...
Automate GUI actions from a screenshot and a natural-language command. Takes a GUI screenshot image and a text instructi...
Generate grounded image captions from an input image, linking phrases to detected objects with bounding boxes. Accepts a...
Analyze images to caption content, detect objects, segment regions, and extract text (OCR). Accepts an image and an opti...
Detect objects in images and return bounding boxes, class labels, and confidence scores. Provide an image and set a conf...
Tag and segment objects in images, returning labels, bounding boxes, and pixel masks. Accepts an image as input and outp...
Execute computer vision tasks from natural-language instructions on an input image and return an image that visualizes t...
Detect anime faces in an input image and return bounding boxes. Outputs YOLO-format coordinates (txt) and an annotated i...
Caption images, detect objects, and extract text from an input image, returning text outputs. Accepts an image plus a ta...
Analyze images to generate captions, detect objects with bounding boxes, and extract text (OCR). Accepts an image plus a...
Detect brand logos in images. Takes an image input and locates logos, returning an output image with bounding boxes, cla...
Detect objects in images from natural-language queries. Takes an image and comma-separated text descriptions of target o...
Detect objects from arbitrary, user-defined categories in images and videos in real time. Takes an image or video plus a...