
chenxwh/cogvlm2-video
Generate text descriptions and answers from a video input. Accepts a video and an optional prompt to perform video capti...
Found 21 models (showing 1-20)
Generate text descriptions and answers from a video input. Accepts a video and an optional prompt to perform video capti...
Answer questions and generate detailed descriptions from a video input. Provide a video and a text prompt to get caption...
Answer questions about videos and generate detailed captions from a video input. Accepts a video and a natural-language...
Generate text descriptions and answers from a video input. Accepts a video and a natural-language prompt to perform vide...
Caption videos and answer open-ended questions about their content. Accept one or more video inputs plus a list of natur...
Caption images and long videos and answer visual questions, returning text. Accepts an image or video plus an instructio...
Caption videos and answer questions about their content. Accepts a video and a natural-language prompt and outputs text...
Analyze videos and generate text descriptions, answers, and summaries from a prompt. Accepts a video and an instruction,...
Caption videos. Provide a video and an optional instruction prompt to produce a single text output for captioning, summa...
Answer questions about images and videos. Accepts an image or a video plus a question and returns text, enabling visual...
Caption images and summarize videos from a text prompt, returning natural-language descriptions and answers. Accepts an...
Transcribe speech to text from audio or video inputs. Auto-detect language or specify one, and optionally translate the...
Answer questions about video content in a multi-turn chat. Take a video and a chat message history as input and return a...
Generate text descriptions for images and videos. Accepts an image or video plus an optional prompt and returns text tha...
Generate captions, answers, and summaries from an input image or video. Accept an image or video plus an optional prompt...
Caption images and videos from a text prompt, returning textual descriptions and summaries. Accept an image or a video p...
Caption and answer questions about videos. Takes a video and a text prompt and returns text, enabling detailed descripti...
Assess video quality from a video input. Return JSON text with numeric scores for aesthetic appeal, technical quality, a...
Chat and analyze across text, images, audio, and video, returning text responses and optional synthesized speech. Accept...
Transcribe spoken words from silent video using visual speech recognition (lip reading). Input a short clip (2–40 second...