jd7h/edit-video-by-editing-text
Edit videos by editing the transcript. Input a video and either transcribe it to text, or supply a desired transcript to...
Found 33 models (showing 21-33)
Edit videos by editing the transcript. Input a video and either transcribe it to text, or supply a desired transcript to...
Generate text, captions, and summaries from text, image, and video inputs. Support question answering, code generation a...
Analyze images or video and generate text captions, answers, and summaries. Accepts single or multiple images or a video...
Extract structured data, answer visual questions, and summarize videos from images and videos. Accepts 1–4 images or a v...
Caption images and videos and answer visual questions. Accepts an optional image or video plus a text prompt and returns...
Transcribe speech from online videos into timestamped text. Accepts a video URL (YouTube and other supported sites) and...
Transcribe audio or video to text. Accepts an audio or video input and returns a JSON transcript or ASS subtitles, lever...
Transcribe or translate speech from audio files and videos to text. Accept audio or video input and return a transcript...
Hold multi-turn, multimodal conversations grounded in images, audio, video, and text, returning answers as text and opti...
Generate and reason over text from prompts, with optional image, audio, and video inputs. Produce answers, explanations,...
Caption images, videos, and audio; answer media-grounded questions; and localize referred objects via visual grounding....
Generate and analyze text from prompts, images, audio, and video with fast, low-latency responses. Accept text plus up t...
Generate text from prompts with optional image, video, and audio context. Accepts multimodal inputs (up to 10 images, 10...