zsxkib/hunyuan-video2video 🔢🖼️📝✓ → 🖼️
About
A state-of-the-art text-to-video generation model capable of creating high-quality videos with realistic motion from text descriptions

Example Output
Prompt:
"high quality nature video of a excited Bengal Tiger walking through the grass, masterpiece, best quality"
Output
Performance Metrics
241.75s
Prediction Time
322.67s
Total Time
All Input Parameters
{ "crf": 19, "steps": 30, "video": "https://replicate.delivery/pbxt/M5n5MuDgBxhSERj6PvHgz4BJcOUdHc9o1ZBXz454GoGP5DrR/2024-12-03-18%3A25%3A47_seed47039_A%20cat%20walks%20on%20the%20grass%2C%20realistic%20style..mp4", "width": 768, "height": 768, "prompt": "high quality nature video of a excited Bengal Tiger walking through the grass, masterpiece, best quality", "flow_shift": 9, "force_rate": 0, "force_size": "Disabled", "frame_rate": 24, "custom_width": 512, "custom_height": 512, "frame_load_cap": 101, "guidance_scale": 6, "keep_proportion": true, "denoise_strength": 0.85, "select_every_nth": 1, "skip_first_frames": 0 }
Input Parameters
- crf
- CRF value for output video quality (0-51). Lower values = better quality.
- seed
- Set a seed for reproducibility. Random by default.
- steps
- Number of sampling (denoising) steps.
- video (required)
- Input video file.
- width
- Output video width (divisible by 16 for best performance).
- height
- Output video height (divisible by 16 for best performance).
- prompt
- Text prompt describing the desired output video style. Be descriptive.
- flow_shift
- Flow shift for temporal consistency. Adjust to tweak video smoothness.
- force_rate
- Force a new frame rate on the input video. 0 means no change.
- force_size
- Force resize method. 'Disabled' means original size. Otherwise applies custom_width/height.
- frame_rate
- Frame rate of the output video.
- custom_width
- Custom width if force_size is not 'Disabled'.
- custom_height
- Custom height if force_size is not 'Disabled'.
- frame_load_cap
- Max frames to load from input video.
- guidance_scale
- Embedded guidance scale. Higher values follow the prompt more strictly.
- keep_proportion
- Keep aspect ratio when resizing. If true, will adjust dimensions proportionally.
- denoise_strength
- Denoise strength (0.0 to 1.0). Higher = more deviation from input content.
- select_every_nth
- Use every nth frame (1 = every frame, 2 = every second frame, etc.).
- skip_first_frames
- Number of initial frames to skip from the input video.
Output Schema
Output
Example Execution Logs
Checking inputs ✅ /tmp/inputs/input.mp4 ==================================== Checking weights ✅ hunyuan_video_vae_bf16.safetensors exists in ComfyUI/models/vae ✅ hunyuan_video_720_fp8_e4m3fn.safetensors exists in ComfyUI/models/diffusion_models ==================================== Running workflow [ComfyUI] got prompt Executing node 43, title: Load Video (Upload) 🎥🅥🅗🅢, class type: VHS_LoadVideo Executing node 42, title: Resize Image, class type: ImageResizeKJ Executing node 39, title: Get Image Size & Count, class type: GetImageSizeAndCount Executing node 7, title: HunyuanVideo VAE Loader, class type: HyVideoVAELoader Executing node 38, title: HunyuanVideo Encode, class type: HyVideoEncode Executing node 16, title: (Down)Load HunyuanVideo TextEncoder, class type: DownloadAndLoadHyVideoTextEncoder [ComfyUI] Loading text encoder model (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14 [ComfyUI] Text encoder to dtype: torch.float16 [ComfyUI] Loading tokenizer (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14 [ComfyUI] Loading text encoder model (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer [ComfyUI] ColorMod: Can't find pypng! Please install to enable 16bit image support. [ComfyUI] ColorMod: Ignoring node 'CV2TonemapDurand' due to cv2 edition/version [ComfyUI] ------------------------------------------ [ComfyUI] [ComfyUI] [34mComfyroll Studio v1.76 : [92m 175 Nodes Loaded[0m [ComfyUI] ------------------------------------------ [ComfyUI] ** For changes, please see patch notes at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/blob/main/Patch_Notes.md [ComfyUI] ** For help, please see the wiki at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/wiki [ComfyUI] ------------------------------------------ [ComfyUI] [34mFizzleDorf Custom Nodes: [92mLoaded[0m [ComfyUI] [92m[tinyterraNodes] [32mLoaded[0m [ComfyUI] Please 'pip install xformers' [ComfyUI] Nvidia APEX normalization not installed, using PyTorch LayerNorm [ComfyUI] [0;33m[ReActor][0m - [38;5;173mSTATUS[0m - [0;32mRunning v0.5.2-a1 in ComfyUI[0m [ComfyUI] Torch version: 2.5.1+cu124 [ComfyUI] [ComfyUI] [36mEfficiency Nodes:[0m Attempting to add Control Net options to the 'HiRes-Fix Script' Node (comfyui_controlnet_aux add-on)...[92mSuccess![0m [ComfyUI] [93mEfficiency Nodes Warning:[0m Failed to import python package 'simpleeval'; related nodes disabled. [ComfyUI] [ComfyUI] [ComfyUI] [92m[rgthree-comfy] Loaded 42 fantastic nodes. 🎉[00m [ComfyUI] [ComfyUI] [34mWAS Node Suite: [0mOpenCV Python FFMPEG support is enabled[0m [ComfyUI] [34mWAS Node Suite: [0m`ffmpeg_bin_path` is set to: /usr/bin/ffmpeg[0m [ComfyUI] [34mWAS Node Suite: [0mFinished.[0m [32mLoaded[0m [0m218[0m [32mnodes successfully.[0m [ComfyUI] encoded latents shape torch.Size([1, 16, 26, 52, 96]) [ComfyUI] Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] [ComfyUI] Loading checkpoint shards: 25%|██▌ | 1/4 [00:00<00:02, 1.46it/s] [ComfyUI] Loading checkpoint shards: 50%|█████ | 2/4 [00:01<00:01, 1.48it/s] [ComfyUI] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:02<00:00, 1.49it/s] [ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00, 2.10it/s] [ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00, 1.82it/s] [ComfyUI] Text encoder to dtype: torch.float16 [ComfyUI] Loading tokenizer (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer Executing node 30, title: HunyuanVideo TextEncode, class type: HyVideoTextEncode [ComfyUI] llm prompt attention_mask shape: torch.Size([1, 161]), masked tokens: 19 [ComfyUI] clipL prompt attention_mask shape: torch.Size([1, 77]), masked tokens: 20 Executing node 1, title: HunyuanVideo Model Loader, class type: HyVideoModelLoader [ComfyUI] Using accelerate to load and assign model weights to device... [ComfyUI] Input (height, width, video_length) = (416, 768, 101) Executing node 3, title: HunyuanVideo Sampler, class type: HyVideoSampler [ComfyUI] Sampling 101 frames in 26 latents at 768x416 with 25 inference steps [ComfyUI] Scheduler config: FrozenDict([('num_train_timesteps', 1000), ('shift', 9.0), ('reverse', True), ('solver', 'euler'), ('n_tokens', None), ('_use_default_values', ['num_train_timesteps', 'n_tokens'])]) [ComfyUI] tensor([978.2609, 972.9730, 967.2897, 961.1651, 954.5454, 947.3684, 939.5604, 931.0344, 921.6867, 911.3924, 900.0000, 887.3240, 873.1343, 857.1429, 838.9830, 818.1818, 794.1176, 765.9575, 732.5581, 692.3077, 642.8571, 580.6452, 500.0001, 391.3044, 236.8421], device='cuda:0') [ComfyUI] [ComfyUI] 0%| | 0/25 [00:00<?, ?it/s] [ComfyUI] 4%|▍ | 1/25 [00:07<03:11, 7.99s/it] [ComfyUI] 8%|▊ | 2/25 [00:16<03:05, 8.05s/it] [ComfyUI] 12%|█▏ | 3/25 [00:24<02:57, 8.07s/it] [ComfyUI] 16%|█▌ | 4/25 [00:32<02:49, 8.07s/it] [ComfyUI] 20%|██ | 5/25 [00:40<02:41, 8.09s/it] [ComfyUI] 24%|██▍ | 6/25 [00:48<02:33, 8.08s/it] [ComfyUI] 28%|██▊ | 7/25 [00:56<02:25, 8.10s/it] [ComfyUI] 32%|███▏ | 8/25 [01:04<02:17, 8.10s/it] [ComfyUI] 36%|███▌ | 9/25 [01:12<02:10, 8.16s/it] [ComfyUI] 40%|████ | 10/25 [01:21<02:02, 8.14s/it] [ComfyUI] 44%|████▍ | 11/25 [01:29<01:53, 8.14s/it] [ComfyUI] 48%|████▊ | 12/25 [01:37<01:45, 8.14s/it] [ComfyUI] 52%|█████▏ | 13/25 [01:45<01:37, 8.14s/it] [ComfyUI] 56%|█████▌ | 14/25 [01:53<01:29, 8.13s/it] [ComfyUI] 60%|██████ | 15/25 [02:01<01:21, 8.12s/it] [ComfyUI] 64%|██████▍ | 16/25 [02:09<01:13, 8.12s/it] [ComfyUI] 68%|██████▊ | 17/25 [02:17<01:05, 8.13s/it] [ComfyUI] 72%|███████▏ | 18/25 [02:26<00:56, 8.14s/it] [ComfyUI] 76%|███████▌ | 19/25 [02:34<00:48, 8.13s/it] [ComfyUI] 80%|████████ | 20/25 [02:42<00:40, 8.14s/it] [ComfyUI] 84%|████████▍ | 21/25 [02:50<00:32, 8.13s/it] [ComfyUI] 88%|████████▊ | 22/25 [02:58<00:24, 8.13s/it] [ComfyUI] 92%|█████████▏| 23/25 [03:06<00:16, 8.12s/it] [ComfyUI] 96%|█████████▌| 24/25 [03:14<00:08, 8.13s/it] [ComfyUI] 100%|██████████| 25/25 [03:22<00:00, 8.13s/it] [ComfyUI] 100%|██████████| 25/25 [03:22<00:00, 8.12s/it] [ComfyUI] Allocated memory: memory=12.306 GB [ComfyUI] Max allocated memory: max_memory=20.619 GB [ComfyUI] Max reserved memory: max_reserved=22.875 GB Executing node 5, title: HunyuanVideo Decode, class type: HyVideoDecode [ComfyUI] [ComfyUI] Decoding rows: 0%| | 0/3 [00:00<?, ?it/s] [ComfyUI] Decoding rows: 33%|███▎ | 1/3 [00:01<00:03, 1.51s/it] [ComfyUI] Decoding rows: 67%|██████▋ | 2/3 [00:03<00:01, 1.56s/it] [ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:03<00:00, 1.04s/it] [ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:03<00:00, 1.18s/it] [ComfyUI] [ComfyUI] Blending tiles: 0%| | 0/3 [00:00<?, ?it/s] [ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 58.36it/s] [ComfyUI] [ComfyUI] Decoding rows: 0%| | 0/3 [00:00<?, ?it/s] [ComfyUI] Decoding rows: 33%|███▎ | 1/3 [00:01<00:02, 1.22s/it] [ComfyUI] Decoding rows: 67%|██████▋ | 2/3 [00:02<00:01, 1.27s/it] [ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:02<00:00, 1.18it/s] [ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:02<00:00, 1.05it/s] [ComfyUI] [ComfyUI] Blending tiles: 0%| | 0/3 [00:00<?, ?it/s] [ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 66.36it/s] [ComfyUI] [ComfyUI] Decoding rows: 0%| | 0/3 [00:00<?, ?it/s] [ComfyUI] Decoding rows: 33%|███▎ | 1/3 [00:00<00:00, 7.03it/s] [ComfyUI] Decoding rows: 67%|██████▋ | 2/3 [00:00<00:00, 7.17it/s] [ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:00<00:00, 9.50it/s] [ComfyUI] [ComfyUI] Blending tiles: 0%| | 0/3 [00:00<?, ?it/s] Executing node 44, title: Image Concatenate Multi, class type: ImageConcatMulti Executing node 53, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine [ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 96.34it/s] [ComfyUI] Prompt executed in 241.31 seconds outputs: {'39': {'text': ['101x768x416']}, '53': {'gifs': [{'filename': 'HunhuyanVideo_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4', 'frame_rate': 24.0}]}} ==================================== HunhuyanVideo_00001.png HunhuyanVideo_00001.mp4
Version Details
- Version ID
d550f226f28b1030c2fedd2947f39f19b4b0233b50364904538caaf037fb18d3
- Version Created
- December 11, 2024