lucataco/nemotron-nano-vl-8b-v1
Answer questions about an image and generate captions and summaries. Accepts a single image and a natural-language quest...
Found 166 models (showing 101-120)
Answer questions about an image and generate captions and summaries. Accepts a single image and a natural-language quest...
Generates text responses based on prompts and can analyze images. Excels at coding tasks with state-of-the-art performan...
Generate SDXL-ready text prompts from an input image. Analyze visual content and style using CLIP Interrogator (OpenCLIP...
Generate detailed SDXL-ready prompts from an input image. Use a CLIP-Interrogator-based pipeline to extract artists, sty...
Classify images into ImageNet-1k categories. Takes a single image as input and outputs ranked class labels (WordNet syns...
Analyzes images and answers questions about visual content through multimodal conversation. Designed as a foundation mod...
Caption images and answer visual questions from an input image, returning text. Accepts an image and a natural-language...
Caption images, detect objects, and extract text from an input image, returning text outputs. Accepts an image plus a ta...
Analyzes images and answers questions about them through conversational interaction. Takes an image and a text prompt as...
Answer questions about images and perform visual reasoning from an image and a text prompt, returning text. Handle visua...
Analyze images to generate captions, detect objects, and extract text (OCR). Accepts an image plus a task selector and o...
Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...
Generate text for chat, Q&A, coding, and document workflows with fast, low-latency responses. Accept text prompts and op...
Analyze images and answer questions from an input image and text instruction, returning text. Support visual question an...
Extract dominant hex color codes and caption or answer questions about an input image. Accepts an image and an optional...
Automate GUI interactions by predicting where to click from a screenshot and a natural-language command. Takes a GUI scr...
Extract text with pixel coordinates from images and screenshots. Accepts an image and returns readable text (markdown) p...
Classify plant leaf images into disease categories. Takes a single image as input and returns a text label for the predi...
Generate text responses from text, image, and audio inputs. Perform image captioning and visual question answering, OCR,...
Answer questions about images from a text prompt and an image input, returning a text response. Perform visual question...