paragekbote/gemma3-torchao-quant-sparse
Generate text and analyze images from a text prompt (optionally with an image), returning text for conversation, caption...
Found 82 models (showing 61-80)
Generate text and analyze images from a text prompt (optionally with an image), returning text for conversation, caption...
Caption images and answer questions about images. Takes an image and a text prompt as input and returns text, enabling i...
Generates text responses from prompts with advanced reasoning capabilities, supporting multimodal inputs including image...
Generate text responses from text, image, video, and audio inputs with controllable reasoning depth. Supports up to 1 mi...
Answer questions and caption images from a text prompt and an optional image, returning text. Generate long-form text an...
Analyze images and generate detailed textual descriptions based on visual content. Supports input via image URLs or base...
Generate text responses with advanced reasoning capabilities for professional knowledge work, coding, and agentic tasks....
Generate text responses from text, image, and audio inputs. Perform image captioning and visual question answering, OCR,...
Generates text responses to medical questions and analyzes medical images for research and educational purposes. Based o...
Generate text from prompts with configurable reasoning effort and verbosity for complex professional work, coding, and m...
Generate text responses from prompts or conversations with configurable reasoning effort and verbosity. Designed specifi...
Answer questions about images and generate detailed image captions using MiniGPT-4 with Vicuna-13B language model. Takes...
Generate text and analyze images with Anthropic's most advanced language model, featuring state-of-the-art coding, reaso...
Generate text responses with advanced reasoning and visual understanding capabilities from text prompts and optional ima...
Advanced multimodal language model that processes text, images, videos, and audio to generate text responses. Features t...
Answers questions about images using natural language. Takes an image and text prompt as input and generates contextual...
Analyzes images and responds to text prompts about visual content. Takes an image and a text prompt as input, then gener...
Analyze images and answer questions about visual content using a Mixture-of-Experts vision-language model. Processes sin...
Answers questions about images through multimodal understanding. Takes an image and a text question as input and generat...
Generates text responses from text, image, and video inputs using a multimodal reasoning model. Processes questions abou...