chenxwh/deepseek-vl2
Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Processes sin...
Found 65 models (showing 41-60)
Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Processes sin...
Analyze images and videos to generate captions, answer visual questions, and summarize scenes. Accepts an image or a vid...
Caption images, detect objects, and extract text from an input image, returning text outputs. Accepts an image plus a ta...
Answer questions about images and perform visual reasoning from an image and a text prompt, returning text. Handle visua...
Analyze images to generate captions, detect objects, and extract text (OCR). Accepts an image plus a task selector and o...
Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Takes an imag...
Analyze images and answer questions from an input image and text instruction, returning text. Support visual question an...
Parse GUI screenshots into structured UI elements with bounding boxes and captions. Accepts an image of a desktop or mob...
Extract structured purchase data from receipt images as JSON. Input a receipt image and output JSON with line items, qua...
Extract text from images and documents in 90+ languages with OCR, returning plain text plus optional structured layout....
Convert documents to Markdown and structured JSON. Accept PDF, DOC/DOCX, PPT/PPTX, and image files (PNG/JPG/WEBP) as inp...
Converts images containing documents, PDFs, charts, and handwritten text into structured markdown while preserving forma...
Analyzes images and answers questions about them through conversational interaction. Takes an image and a text prompt as...
Extract structured data, answer visual questions, and summarize videos from images and videos. Accepts 1–4 images or a v...
Extract text and convert documents to markdown format from images using optical character recognition. Supports multiple...
Extract text and tables from document images or PDFs. Accepts an image or a selected PDF page and returns structured tex...
Answer questions about images and generate captions from an image input and a natural-language question, returning text....
Answer questions about images with grounded visual references. Takes an image and a natural-language prompt and returns...
Answer questions about images from a text prompt and an image input, returning a text response. Perform visual question...
Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...