zsxkib/create-video-dataset ❓🔢🎥✓🖼️📝 → 🖼️
About
Easily create video datasets with auto-captioning for Hunyuan-Video LoRA finetuning

Example Output
Output
Performance Metrics
10.07s
Prediction Time
40.47s
Total Time
All Input Parameters
{ "duration": 10, "end_time": 40, "video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "start_time": 10, "autocaption": true, "num_segments": 3, "trigger_word": "RICKROLL", "caption_prompt": "Describe this video clip briefly, focusing on the main action and visual elements.", "autocaption_prefix": "a video of RICKROLL, " }
Input Parameters
- quality
- Video quality preset: 'fast' (lower quality, smaller files), 'balanced', or 'high' (best quality, larger files)
- end_time
- End time in seconds for video processing. Set to 0 to process until the end.
- video_url
- YouTube/video URL to process. Leave empty if uploading a file. Note: URL takes precedence if both URL and file are provided.
- num_scenes
- Number of scenes to extract (0 = all detected scenes)
- skip_intro
- Automatically skip first 10 seconds (typical intro)
- start_time
- Start time in seconds for video processing
- target_fps
- Target frame rate (e.g. 24, 30). Set to -1 to keep original fps.
- video_file
- Video file to process. Leave empty if using URL. Ignored if URL is provided.
- autocaption
- Let AI generate a caption for your video. If False, you must provide custom_caption.
- preview_only
- Generate scene previews without creating full dataset
- trigger_word
- Trigger word to include in captions (e.g., TOK, STYLE3D). Will be added at start of caption.
- caption_style
- Caption style: 'minimal' (short), 'detailed' (longer descriptions), or 'custom'
- custom_caption
- Your custom caption. Required if caption_style is 'custom' or autocaption is False.
- detection_mode
- Scene detection method: 'content' (fast cuts), 'adaptive' (camera movement), or 'threshold' (fades)
- max_scene_length
- Maximum scene length in seconds
- min_scene_length
- Minimum scene length in seconds
- autocaption_prefix
- Text to add BEFORE caption. Example: 'a video of'
- autocaption_suffix
- Text to add AFTER caption. Example: 'in a cinematic style'
Output Schema
Output
Example Execution Logs
📥 Downloading video from: https://www.youtube.com/watch?v=dQw4w9WgXcQ [youtube] Extracting URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ [youtube] dQw4w9WgXcQ: Downloading webpage [youtube] dQw4w9WgXcQ: Downloading tv player API JSON [youtube] dQw4w9WgXcQ: Downloading ios player API JSON [youtube] dQw4w9WgXcQ: Downloading player 6e1dd460 [youtube] dQw4w9WgXcQ: Downloading m3u8 information [info] dQw4w9WgXcQ: Downloading 1 format(s): 18 [download] Destination: /tmp/video_processing/videos/Rick Astley - Never Gonna Give You Up (Official Music Video).mp4 [download] 0.0% of 8.68MiB at Unknown B/s ETA Unknown Downloading... 0% [download] 0.0% of 8.68MiB at Unknown B/s ETA Unknown Downloading... 0% [download] 0.1% of 8.68MiB at 1.65MiB/s ETA 00:05 [download] 0.2% of 8.68MiB at 3.35MiB/s ETA 00:02 [download] 0.3% of 8.68MiB at 6.58MiB/s ETA 00:01 [download] 0.7% of 8.68MiB at 6.38MiB/s ETA 00:01 [download] 1.4% of 8.68MiB at 9.24MiB/s ETA 00:00 [download] 2.9% of 8.68MiB at 13.67MiB/s ETA 00:00 [download] 5.7% of 8.68MiB at 21.90MiB/s ETA 00:00 [download] 11.5% of 8.68MiB at 36.51MiB/s ETA 00:00 [download] 23.0% of 8.68MiB at 61.87MiB/s ETA 00:00 [download] 46.1% of 8.68MiB at 95.07MiB/s ETA 00:00 [download] 92.2% of 8.68MiB at 147.88MiB/s ETA 00:00 [download] 100.0% of 8.68MiB at 155.89MiB/s ETA 00:00 Downloading... 100% [download] 100% of 8.68MiB in 00:00:00 at 99.17MiB/s Download completed ✓ 📝 Renamed file to: rickastley_nevergonnagiveyouupofficialmusicvideo.mp4 ✂️ Splitting video from 10.0s to 40.0s Creating 3 segments of 10.0s each ✓ Created segment 1/3 (10.0s to 20.0s) ✓ Created segment 2/3 (20.0s to 30.0s) ✓ Created segment 3/3 (30.0s to 40.0s) 🎬 Processing segment 1/3 🤖 Generating caption using AI... qwen-vl-utils using torchvision to read video. 📝 Caption for segment 1: -------------------- a video of RICKROLL, RICKROLL The video features a man and a woman dancing in different locations, including a room with a brick wall and a white wall. The man is wearing a blue shirt and blue jeans, while the woman is wearing a white dress. The video also shows a man in a white jacket and black shirt dancing in a room with a brick wall. -------------------- 🎬 Processing segment 2/3 🤖 Generating caption using AI... 📝 Caption for segment 2: -------------------- a video of RICKROLL, RICKROLL The video features a man in a white jacket who is dancing and singing in front of a brick wall. He is later joined by another man in a blue shirt who also starts dancing and singing. The video captures the energy and movement of the two men as they perform together. -------------------- 🎬 Processing segment 3/3 🤖 Generating caption using AI... 📝 Caption for segment 3: -------------------- a video of RICKROLL, RICKROLL The video features a man wearing a blue shirt and sunglasses who is dancing and singing in various locations, including a tunnel and in front of a white brick wall. The man is also seen throwing a ball in the air. -------------------- 📦 Creating zip file... 📋 Zip contents: -------------------- Size Name ---- ---- 698.8K videos/rickastley_nevergonnagiveyouupofficialmusicvideo_seg01.mp4 0.3K videos/rickastley_nevergonnagiveyouupofficialmusicvideo_seg01.txt 616.3K videos/rickastley_nevergonnagiveyouupofficialmusicvideo_seg02.mp4 0.3K videos/rickastley_nevergonnagiveyouupofficialmusicvideo_seg02.txt 791.1K videos/rickastley_nevergonnagiveyouupofficialmusicvideo_seg03.mp4 0.2K videos/rickastley_nevergonnagiveyouupofficialmusicvideo_seg03.txt -------------------- ✨ Success! Output saved to: processed_videos_20250117_192541.zip
Version Details
- Version ID
c88dab32692c79764ee43fe43956cf81e0592065150d7d5d5d67090608a6dc5d
- Version Created
- April 2, 2025