lucataco/nemotron-nano-vl-8b-v1
Answer questions about an image and generate captions and summaries. Accepts a single image and a natural-language quest...
Found 157 models (showing 101-120)
Answer questions about an image and generate captions and summaries. Accepts a single image and a natural-language quest...
Generate text and code from a prompt, with optional image analysis for captions and visual reasoning. Accepts a text pro...
Generate SDXL-ready text prompts from an input image. Analyze visual content and style using CLIP Interrogator (OpenCLIP...
Generate detailed SDXL-ready prompts from an input image. Use a CLIP-Interrogator-based pipeline to extract artists, sty...
Classify images into ImageNet-1k categories. Takes a single image as input and outputs ranked class labels (WordNet syns...
Answer questions about images. Accepts an image and an optional text prompt and returns a text response for visual quest...
Caption images and answer visual questions from an input image, returning text. Accepts an image and a natural-language...
Caption images, detect objects, and extract text from an input image, returning text outputs. Accepts an image plus a ta...
Answer questions about images. Accept an image and a text prompt and return text outputs for visual question answering,...
Answer questions about images and perform visual reasoning from an image and a text prompt, returning text. Handle visua...
Analyze images to generate captions, detect objects, and extract text (OCR). Accepts an image plus a task selector and o...
Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...
Generate text for chat, Q&A, coding, and document workflows with fast, low-latency responses. Accept text prompts and op...
Analyze images and answer questions from an input image and text instruction, returning text. Support visual question an...
Extract dominant hex color codes and caption or answer questions about an input image. Accepts an image and an optional...
Automate GUI interactions by predicting where to click from a screenshot and a natural-language command. Takes a GUI scr...
Extract text with pixel coordinates from images and screenshots. Accepts an image and returns readable text (markdown) p...
Classify plant leaf images into disease categories. Takes a single image as input and returns a text label for the predi...
Generate text responses from text, image, and audio inputs. Perform image captioning and visual question answering, OCR,...
Answer questions about images from a text prompt and an image input, returning a text response. Perform visual question...