pku-yuangroup/llava-cot
Answer questions about images with step-by-step reasoning. Take an image and an optional text prompt and output text, in...
Found 166 models (showing 121-140)
Answer questions about images with step-by-step reasoning. Take an image and an optional text prompt and output text, in...
Answer questions about images from an image and text prompt, returning text responses. Perform visual question answering...
Answer questions about images and generate captions from an image and a text prompt, returning text. Perform visual ques...
Extract text from images and documents in 90+ languages with OCR, returning plain text plus optional structured layout....
Generates text responses based on prompts or multi-turn conversations, designed as a faster and more cost-effective vers...
Generates descriptive text captions from three input images using arithmetic operations on image features. The model com...
Classify images into categories. Accepts a single image and returns top predicted classes with probabilities using a Res...
Converts images containing documents, PDFs, charts, and handwritten text into structured markdown while preserving forma...
Classify the nationality of a national ID card from an image. Accepts a photo or scan of an ID document and returns the...
Generate image captions from a single image input. Select from COCO or Conceptual Captions modes and optionally use beam...
Caption images. Accepts an image input and generates a zero-shot natural-language description, optionally conditioned by...
Generate text responses from text, image, video, and audio inputs with controllable reasoning depth. Supports up to 1 mi...
Classify age range from an input image. Accepts a single image and predicts a discrete age-range label with a confidence...
Analyze images or video and generate text captions, answers, and summaries. Accepts single or multiple images or a video...
Extract structured data, answer visual questions, and summarize videos from images and videos. Accepts 1–4 images or a v...
Caption images and videos and answer visual questions. Accepts an optional image or video plus a text prompt and returns...
Caption images and answer visual questions from an input image, returning text. Accept an image and an optional instruct...
Moderate images for safety and policy compliance. Takes an image input (optionally a custom prompt) and outputs structur...
Caption images by generating detailed, paragraph-length natural-language descriptions from a single image input. Outputs...
Extract text and convert documents to markdown format from images using optical character recognition. Supports multiple...