nelsonjchen/minigpt-4_vicuna-13b
Answer questions about images and generate detailed image captions using MiniGPT-4 with Vicuna-13B language model. Takes...
Found 82 models (showing 61-80)
Answer questions about images and generate detailed image captions using MiniGPT-4 with Vicuna-13B language model. Takes...
Analyzes images and answers questions about visual content using a Mixture-of-Experts architecture. Takes an image and t...
Answer questions about images in Korean. Take an image and a Korean prompt and generate Korean text for visual question...
Analyze images or video and generate text captions, answers, and summaries. Accepts single or multiple images or a video...
Caption images and videos and answer visual questions. Accepts an optional image or video plus a text prompt and returns...
Answer questions about images and generate captions from an image and a text prompt, outputting text. Perform visual que...
Answer questions about images from a text prompt and return a text response. Accept a single image plus a natural-langua...
Visualize which regions of an image CLIP associates with a given text prompt. Generates a saliency heatmap, optionally o...
Generates text responses from prompts with advanced reasoning capabilities, supporting multimodal inputs including image...
Generate text responses from text, image, video, and audio inputs with controllable reasoning depth. Supports up to 1 mi...
Answer questions and caption images from a text prompt and an optional image, returning text. Generate long-form text an...
Estimate 2D poses of multiple people in an image using a lightweight version of OpenPose. Outputs include 18 keypoints p...
Analyze images and generate detailed textual descriptions based on visual content. Supports input via image URLs or base...
Generate text responses from text, image, and audio inputs. Perform image captioning and visual question answering, OCR,...
Analyze images and text to generate answers, working code, and polished documents. Takes a text prompt with an optional...
Generates text responses to medical questions and analyzes medical images for research and educational purposes. Based o...
Generate text from prompts with configurable reasoning effort and verbosity for complex professional work, coding, and m...
Generate text and analyze images with Anthropic's most advanced language model, featuring state-of-the-art coding, reaso...
Advanced multimodal language model that processes text, images, videos, and audio to generate text responses. Features t...
Generate text responses with advanced reasoning and visual understanding capabilities from text prompts and optional ima...