chenxwh/deepseek-vl2
Answer questions about images and documents. Accepts 1–3 images plus an instruction and returns text for tasks like visu...
Found 62 models (showing 41-60)
Answer questions about images and documents. Accepts 1–3 images plus an instruction and returns text for tasks like visu...
Analyze images and videos to generate captions, answer visual questions, and summarize scenes. Accepts an image or a vid...
Caption images, detect objects, and extract text from an input image, returning text outputs. Accepts an image plus a ta...
Analyze images and generate text responses to prompts. Accepts an image and a text prompt, and outputs text for visual q...
Analyze images to generate captions, detect objects, and extract text (OCR). Accepts an image plus a task selector and o...
Answer questions about images, documents, charts, and tables. Takes an image and a text prompt and returns text. Support...
Analyze images and answer questions from an input image and text instruction, returning text. Support visual question an...
Parse GUI screenshots into structured UI elements with bounding boxes and captions. Accepts an image of a desktop or mob...
Extract structured purchase data from receipt images as JSON. Input a receipt image and output JSON with line items, qua...
Extract text from images and documents in 90+ languages with OCR, returning plain text plus optional structured layout....
Convert documents to Markdown and structured JSON. Accept PDF, DOC/DOCX, PPT/PPTX, and image files (PNG/JPG/WEBP) as inp...
Extract text and document structure from an input image into Markdown or plain text. Handle PDFs, scans, screenshots, re...
Answer questions about images. Accept an image and a text prompt and return text outputs for visual question answering,...
Extract structured data, answer visual questions, and summarize videos from images and videos. Accepts 1–4 images or a v...
Extract text and document structure from images into plain text or Markdown. Accept an image and a task type (markdown,...
Extract text and tables from document images or PDFs. Accepts an image or a selected PDF page and returns structured tex...
Answer questions about images and generate captions from an image input and a natural-language question, returning text....
Answer questions about images with grounded visual references. Takes an image and a natural-language prompt and returns...
Answer questions about images from a text prompt and an image input, returning a text response. Perform visual question...
Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...