lucataco/interactiveomni-8b
Hold multi-turn, multimodal conversations grounded in images, audio, video, and text, returning answers as text and opti...
Found 157 models (showing 141-157)
Hold multi-turn, multimodal conversations grounded in images, audio, video, and text, returning answers as text and opti...
Extract text and tables from document images or PDFs. Accepts an image or a selected PDF page and returns structured tex...
Answer questions about images with grounded visual references. Takes an image and a natural-language prompt and returns...
Generate and reason over text and images from prompts or multi-turn chat, with configurable reasoning effort. Accepts te...
Generate descriptive prompts for text-to-image models from a single image. Outputs a CLIP Interrogator-style prompt stri...
Caption images. Takes an image as input and outputs a short natural-language description (image-to-text) using OpenCLIP...
Generate and reason over text from prompts, with optional image, audio, and video inputs. Produce answers, explanations,...
Caption images and answer visual questions from an input image and text prompt. Accept an image plus a question or instr...
Generate and reason over text and images for coding, professional knowledge work, and agentic workflows. Accepts a singl...
Caption images, videos, and audio; answer media-grounded questions; and localize referred objects via visual grounding....
Analyze images and text to generate answers, working code, and polished documents. Takes a text prompt with an optional...
Answer medical questions from text and optional medical images, returning explanatory text. Accept a prompt and optional...
Generate and analyze text from prompts, images, audio, and video with fast, low-latency responses. Accept text plus up t...
Generate text from prompts with optional image, video, and audio context. Accepts multimodal inputs (up to 10 images, 10...
Generate text and analyze images for coding, reasoning, and professional workflows. Accept text prompts or chat messages...
Generate text and analyze images for complex, multi-step work. Accepts a text prompt (and an optional image) and returns...
Answer questions about an input image from a text prompt, returning text. Generate image captions and short visual descr...