chenxwh/cogvlm2-video
Caption and answer questions about videos. Accepts a video and a text prompt and returns text outputs such as descriptio...
Found 30 models (showing 1-20)
Caption and answer questions about videos. Accepts a video and a text prompt and returns text outputs such as descriptio...
Answer questions about a video and generate detailed descriptions from a video input. Takes a video and a natural-langua...
Analyze videos and generate text outputs, including detailed captions, summaries, and answers to questions about the con...
Analyze videos and return text responses to prompts. Takes a video and a text prompt as input and outputs text for video...
Answer questions and generate descriptions from video input. Provide one or more videos and a text prompt, and receive t...
Generate captions, answers, and summaries for images and videos. Accepts an image or video plus a text prompt and output...
Generate text descriptions and answers about video content from a video input. Accept a video plus an optional prompt (d...
Generate text from a video input. Take a video and an instruction prompt, and return captions, summaries, or answers to...
Caption and answer questions about videos. Accepts a video and an optional text prompt (instruction or question) and ret...
Answer questions about images and videos, perform OCR, and describe scenes, returning text. Accepts an image or a video...
Analyze images and videos to generate captions, answer visual questions, and summarize scenes. Accepts an image or a vid...
Transcribe speech from audio or video into text. Outputs a full transcript with optional per-segment timestamps and spea...
Answer questions about videos in multi-turn conversations. Accepts a video and a chat message history, and returns text...
Caption images and videos. Take an image or video plus an optional prompt and return text that describes the visual cont...
Generate captions, answers, and summaries from an input image or video. Accept an image or video plus an optional prompt...
Generate text descriptions for images and videos. Accepts a single image or video plus an optional instruction prompt, a...
Answer questions about videos and generate detailed descriptions from a video input and a text prompt. Handle long-form...
Assess video quality from a video input, returning separate aesthetic, technical, and overall scores. Use to evaluate us...
Chat and reason across text, images, audio, and video, outputting text and synthesized speech. Accept text prompts with...
Transcribe speech from silent or muted videos into text using visual speech recognition (lip reading). Accepts a video c...