image-object-detection AI Models

nomagick/qwen-vl-chat

Generates text responses based on text prompts and images with ChatML prompt interface and streaming support. Accepts up...

🖼️ → 📝 • text-generation • image-to-text • image-analysis • 1.1K runs

🤖 Model 🖼️

adirik/grounding-dino

Detect objects in images using natural language queries. Accepts an image and comma-separated text queries (category nam...

🖼️ • image-object-detection • open-vocabulary • 18.5M runs

🤖 Model 🖼️

hilongjw/dora_page

Detect objects in an input image and return an annotated image with bounding boxes, class labels, and optional confidenc...

🖼️ • image-object-detection • 197 runs

🤖 Model 🖼️

invarrow/carai-offcial

Detect and localize vehicle damage from an input image, returning an annotated image that highlights affected areas for...

🖼️ • image-object-detection • automotive • 598 runs

🤖 Model 🖼️

hilongjw/screen-ui-detector

Detect user interface elements in screenshots and app/web screens, returning an annotated image with bounding boxes, lab...

🖼️ • image-object-detection • ui-detection • 64 runs

🤖 Model 🖼️

hilongjw/section-ui-detector

Detect UI layout sections and components in screenshots. Takes an input image of a webpage or app interface and returns...

🖼️ • image-object-detection • 45 runs

🤖 Model 🖼️ → 📝

microsoft/omniparser-v2

Parse GUI screenshots into structured UI elements with bounding boxes and captions. Accepts an image of a desktop or mob...

🖼️ → 📝 • image-to-text • image-object-detection • ui-parsing • 185.5K runs

🤖 Model

w95/tinyclick

Automate GUI interactions by predicting where to click from a screenshot and a natural-language command. Takes a GUI scr...

gui-automation • visual-grounding • 28 runs

🤖 Model 🖼️ → 📝

lucataco/kosmos-2

Caption images with grounded object localization. Take an image as input and return a brief or detailed natural-language...

🖼️ → 📝 • image-to-text • image-object-detection • 1.9K runs

🤖 Model 🖼️ → 📝

hiscodesmells/florence-2-base

Performs multiple computer vision tasks on images including captioning, object detection, OCR, and segmentation. Takes a...

🖼️ → 📝 • image-to-text • object-detection • ocr • 323 runs

🤖 Model 🖼️

adirik/codet

Detect objects in images and return bounding boxes, class names, and confidence scores. Accepts an image input and outpu...

🖼️ • image-object-detection • 1.6K runs

🤖 Model 🖼️ → 📝

idea-research/ram-grounded-sam

Tag and segment objects in images, returning labels, bounding boxes, and pixel masks. Accepts an image as input and outp...

🖼️ → 📝 • image-object-detection • image-segmentation • image-to-text • 1.5M runs

🤖 Model 🖼️ → 🖼️

cjwbw/instructcv

Execute computer vision tasks from natural-language instructions on an input image and return an image that visualizes t...

🖼️ → 🖼️ • image-to-image • image-object-detection • image-segmentation • 359 runs

🤖 Model 🖼️

shreejalmaharjan-27/anime-face-detector

Detect anime faces in images. Accepts an input image and returns YOLO-format face bounding boxes (class id, x_center, y_...

🖼️ • face-detection • image-object-detection • anime • 28.0K runs

🤖 Model 🖼️ → 📝

lucataco/florence-2-base

Performs multiple vision and vision-language tasks based on text prompts. Supports image captioning with varying detail...

🖼️ → 📝 • image-to-text • object-detection • ocr • 133.5K runs

🤖 Model 🖼️ → 📝

lucataco/florence-2-large

Analyze images to generate captions, detect objects, and extract text (OCR). Accepts an image plus a task selector and o...

🖼️ → 📝 • image-to-text • image-object-detection • ocr • 471.5K runs

🤖 Model 🖼️

hilongjw/logo_detector

Detect logos in images and return an annotated image. Configure detection score threshold and IoU filtering, toggle labe...

🖼️ • image-object-detection • logo-detection • 278 runs

🤖 Model 🖼️

adirik/owlvit-base-patch32

Detect objects in an image using free-form text queries (zero-shot, open-vocabulary). Accepts an image and a comma-separ...

🖼️ • image-object-detection • open-vocabulary-detection • 24.4K runs

🤖 Model 🖼️

zsxkib/yolo-world

Detect objects from arbitrary, user-defined categories in images and videos in real time. Takes an image or video plus a...

🖼️ • image-object-detection • open-vocabulary • 12.2K runs

🤖 Model 🖼️

cjwbw/openpsg

Generate panoptic scene graphs from an input image. Segment both “things” and “stuff” at pixel level, detect objects and...

🖼️ • image-segmentation • scene-graph-generation • image-object-detection • 1.5K runs