video-to-text AI Models - Cloudernative

chenxwh/cogvlm2-video

Caption and answer questions about videos. Accepts a video and a text prompt and returns text outputs such as descriptio...

🎥 • video-to-text • video-auto-captioning • 664.1K runs

🤖 Model 🎥

uncensored-com/llava-next-video

Analyzes video content and answers questions about it using text prompts. Takes a video file as input and generates text...

🎥 • video-to-text • video-analysis • question-answering • 3.7K runs

🤖 Model 🎥

lucataco/apollo-7b

Analyzes videos and answers questions about video content using a 7 billion parameter multimodal model. Handles long-for...

🎥 • video-to-text • video-embedding • 124.9K runs

🤖 Model 📝 → 📝

lucataco/videollama3-7b

Analyzes videos and generates text descriptions or answers questions about video content. Built on Qwen2.5-7B, this mult...

📝 → 📝 • video-to-text • video-analysis • question-answering • 40.5K runs

🤖 Model 🎥

n1jl0091/video-llava-7b-hf_replicate_n1jl0091

Answer questions and generate descriptions from video input. Provide one or more videos and a text prompt, and receive t...

🎥 • video-to-text • video-question-answering • 112 runs

🤖 Model 🖼️ → 📝

lucataco/qwen2-vl-7b-instruct

Analyze images and videos with text prompts to generate detailed descriptions, answer questions, and extract information...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 547.2K runs

🤖 Model 🎥

aodianyun/qwen2-vl-7b

Generate text descriptions and answers about video content from a video input. Accept a video plus an optional prompt (d...

🎥 • video-to-text • 1.6K runs

🤖 Model 🎥

cuuupid/qwen2-vl-2b

Generate text from a video input. Take a video and an instruction prompt, and return captions, summaries, or answers to...

🎥 • video-to-text • video-auto-captioning • 601 runs

🤖 Model 🎥

aodianyun/qwen2-vl-2b

Caption and answer questions about videos. Accepts a video and an optional text prompt (instruction or question) and ret...

🎥 • video-to-text • video-auto-captioning • 130 runs

🤖 Model 🖼️ → 📝

sai88uk/minicpm-v-45-v9

Analyzes single images, multiple images, and high-FPS videos to answer questions about their content. Supports controlla...

🖼️ → 📝 • image-to-text • video-to-text • ocr • 14.3K runs

🤖 Model 🖼️ → 📝

lucataco/minicpm-v-4

Analyze images and videos with text prompts to generate detailed text responses. Handles single images, multiple images,...

🖼️ → 📝 • image-to-text • video-to-text • visual-understanding • 795 runs

🤖 Model

jigsawstack/speech-to-text

Transcribe speech from audio or video into text. Outputs a full transcript with optional per-segment timestamps and spea...

speech-to-text • speaker-diarization • 6 runs

🤖 Model 📝 → 📝

tomasruizt/apollo-7b-multiturn

Analyzes videos and engages in multi-turn conversations about video content. Takes a video file and a conversation histo...

📝 → 📝 • video-to-text • text-generation • 25 runs

🤖 Model 🖼️ → 📝

aodianyun/minicpm-v-26

Analyzes images and videos with text prompts, providing detailed visual understanding and question answering capabilitie...

🖼️ → 📝 • image-to-text • visual-understanding • 16 runs

🤖 Model 🖼️ → 📝

hexiaochun/minicpm_v26

Generate text descriptions and answers from an input image or video. Accept an optional instruction or question prompt t...

🖼️ → 📝 • image-to-text • video-to-text • visual-question-answering • 518 runs

🤖 Model 🖼️ → 📝

aodianyun/minicpm-v-26-int4

Generate text descriptions for images and videos. Accepts a single image or video plus an optional instruction prompt, a...

🖼️ → 📝 • image-to-text • video-to-text • 12 runs

🤖 Model 🎥

lucataco/apollo-3b

Analyzes videos and answers questions about their content through text output. Supports long-form video comprehension in...

🎥 • video-to-text • video-analysis • 162 runs

🤖 Model 🎥

turian/dover-video-quality-assessment

Evaluates video quality by analyzing both aesthetic appeal and technical quality aspects. Takes a video as input and ret...

🎥 • video-to-text • 36 runs

🤖 Model 📝 → 🔊

lucataco/qwen2.5-omni-7b

Process text, images, audio, and video inputs to generate text and speech responses simultaneously. Features a novel Thi...

📝 → 🔊 • text-generation • image-to-text • video-to-text • 32.9K runs

🤖 Model 🎥

basord/lip-reading-ai-vsr

Transcribe speech from silent or muted videos into text using visual speech recognition (lip reading). Accepts a video c...

🎥 • video-to-text • lip-reading • visual-speech-recognition • 10.7K runs