nelsonjchen/minigpt-4_vicuna-13b
Answer questions about images and generate captions. Accepts an image and an optional text prompt/question, and returns...
Found 73 models (showing 61-73)
Answer questions about images and generate captions. Accepts an image and an optional text prompt/question, and returns...
Analyze images and answer questions from an image plus text prompt, returning text. Handle visual question answering (VQ...
Answer questions about images in Korean. Take an image and a Korean prompt and generate Korean text for visual question...
Analyze images or video and generate text captions, answers, and summaries. Accepts single or multiple images or a video...
Caption images and videos and answer visual questions. Accepts an optional image or video plus a text prompt and returns...
Answer questions about images and generate captions from an image and a text prompt, outputting text. Perform visual que...
Answer questions about images from a text prompt and return a text response. Accept a single image plus a natural-langua...
Visualize which regions of an image CLIP associates with a given text prompt. Generates a saliency heatmap, optionally o...
Generate and reason over text from prompts, with optional image, audio, and video inputs. Produce answers, explanations,...
Generate text and multimodal analyses from text, image, and video inputs. Handle very long contexts (around 1M tokens) f...
Answer questions and caption images from a text prompt and an optional image, returning text. Generate long-form text an...
Estimate 2D poses of multiple people in an image using a lightweight version of OpenPose. Outputs include 18 keypoints p...
Analyze images and generate detailed textual descriptions based on visual content. Supports input via image URLs or base...