peter65374/cog-resnet
Classify images by identifying objects and assigning confidence scores to each detected object.
Found 82 models (showing 21-40)
Classify images by identifying objects and assigning confidence scores to each detected object.
Answer questions about images and generate captions from a text prompt and an optional image, returning text. Perform vi...
Generates text responses from prompts with ultra-low latency and fast response times. Supports up to 1 million token con...
Answer questions about images and generate captions from an image and a text query, returning text. Accept a single imag...
Detect the likelihood of deepfake faceswaps in images. This model focuses on identifying faceswaps with high confidence,...
Generate text and analyze images from a text prompt (optionally with an image), returning text for conversation, caption...
Answer questions about images. Takes an image and a text prompt and returns a text response, enabling visual question an...
Compute an integer from an input image. Accepts an image and outputs a numeric value, useful for testing image input pip...
Generate fine-grained captions for images using a CLIP-based reward system. This model evaluates image captions based on...
Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Takes an imag...
Answer questions about images and text with multimodal reasoning. Takes a text prompt with an optional image and outputs...
Generates text responses from text prompts, messages, and images with multimodal capabilities. Processes both text and v...
Generate text responses with advanced reasoning capabilities, specializing in math, coding, and visual analysis. Process...
Caption images and answer visual questions from an input image and text prompt. Accept an image plus a question or instr...
Analyzes images and generates text descriptions or answers questions about visual content. Uses a projection module trai...
Answer questions about images, caption scenes, and localize entities with bounding boxes. Accept a ChatML-formatted prom...
Generate text responses from prompts with support for image analysis and visual understanding. Fast, lightweight languag...
Generate text based on text prompts and optional image inputs. This multimodal language model handles both text and imag...
Answer questions about images and documents from an image and a text prompt, returning text. Handle visual question answ...
Caption images and answer visual questions from an image plus an optional text prompt, returning text. Handle OCR-style...