lucataco/interactiveomni-8b
Processes multiple inputs simultaneously including images, audio, text, and video to generate coherent text and speech r...
Found 166 models (showing 141-160)
Processes multiple inputs simultaneously including images, audio, text, and video to generate coherent text and speech r...
Extract text and tables from document images or PDFs. Accepts an image or a selected PDF page and returns structured tex...
Answer questions about images with grounded visual references. Takes an image and a natural-language prompt and returns...
Generate text responses from prompts or conversations with configurable reasoning effort and verbosity. Designed specifi...
Generate descriptive prompts for text-to-image models from a single image. Outputs a CLIP Interrogator-style prompt stri...
Caption images. Takes an image as input and outputs a short natural-language description (image-to-text) using OpenCLIP...
Generates text responses from prompts with advanced reasoning capabilities, supporting multimodal inputs including image...
Caption images and answer visual questions from an input image and text prompt. Accept an image plus a question or instr...
Generate text responses with advanced reasoning capabilities for professional knowledge work, coding, and agentic tasks....
Caption images, videos, and audio; answer media-grounded questions; and localize referred objects via visual grounding....
Analyze images and text to generate answers, working code, and polished documents. Takes a text prompt with an optional...
Answer medical questions from text and optional medical images, returning explanatory text. Accept a prompt and optional...
Generate and analyze text from prompts, images, audio, and video with fast, low-latency responses. Accept text plus up t...
Advanced multimodal language model that processes text, images, videos, and audio to generate text responses. Features t...
Generate text from prompts with configurable reasoning effort and verbosity for complex professional work, coding, and m...
Generate text and analyze images with Anthropic's most advanced language model, featuring state-of-the-art coding, reaso...
Answer questions about an input image from a text prompt, returning text. Generate image captions and short visual descr...
Generate text responses with advanced reasoning and visual understanding capabilities from text prompts and optional ima...
Generate text, code, and engage in multi-modal conversations using Moonshot AI's 1 trillion parameter frontier model wit...
Generates text responses to medical questions and analyzes medical images for research and educational purposes. Based o...