lucataco/nemotron-nano-vl-8b-v1
Answer questions about an image and generate captions and summaries. Accepts a single image and a natural-language quest...
Found 150 models (showing 101-120)
Answer questions about an image and generate captions and summaries. Accepts a single image and a natural-language quest...
Generate text and code from a prompt, with optional image analysis for captions and visual reasoning. Accepts a text pro...
Generate SDXL-ready text prompts from an input image. Analyze visual content and style using CLIP Interrogator (OpenCLIP...
Generate detailed SDXL-ready prompts from an input image. Use a CLIP-Interrogator-based pipeline to extract artists, sty...
Classify images into ImageNet-1k categories. Takes a single image as input and outputs ranked class labels (WordNet syns...
Answer questions about images. Accepts an image and an optional text prompt and returns a text response for visual quest...
Caption images and answer visual questions from an input image, returning text. Accepts an image and a natural-language...
Caption images, detect objects, and extract text from an input image, returning text outputs. Accepts an image plus a ta...
Answer questions about images. Accept an image and a text prompt and return text outputs for visual question answering,...
Analyze images and generate text responses to prompts. Accepts an image and a text prompt, and outputs text for visual q...
Analyze images to generate captions, detect objects, and extract text (OCR). Accepts an image plus a task selector and o...
Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...
Generate text for chat, Q&A, coding, and document workflows with fast, low-latency responses. Accept text prompts and op...
Analyze images and answer questions from an input image and text instruction, returning text. Support visual question an...
Extract dominant hex color codes and caption or answer questions about an input image. Accepts an image and an optional...
Automate GUI interactions by predicting where to click from a screenshot and a natural-language command. Takes a GUI scr...
Extract text with pixel coordinates from images and screenshots. Accepts an image and returns readable text (markdown) p...
Classify plant leaf images into disease categories. Takes a single image as input and returns a text label for the predi...
Generate text from text, image, and audio inputs. Handle transcription, summarization, and visual description/QA, includ...
Answer questions about images from a text prompt and an image input, returning a text response. Perform visual question...