zsxkib/hunyuan-video2video 🔢🖼️📝✓ → 🖼️
About
A state-of-the-art text-to-video generation model capable of creating high-quality videos with realistic motion from text descriptions
Example Output
Prompt:
"high quality nature video of a excited Bengal Tiger walking through the grass, masterpiece, best quality"
Output
Performance Metrics
241.75s
Prediction Time
322.67s
Total Time
All Input Parameters
{
"crf": 19,
"steps": 30,
"video": "https://replicate.delivery/pbxt/M5n5MuDgBxhSERj6PvHgz4BJcOUdHc9o1ZBXz454GoGP5DrR/2024-12-03-18%3A25%3A47_seed47039_A%20cat%20walks%20on%20the%20grass%2C%20realistic%20style..mp4",
"width": 768,
"height": 768,
"prompt": "high quality nature video of a excited Bengal Tiger walking through the grass, masterpiece, best quality",
"flow_shift": 9,
"force_rate": 0,
"force_size": "Disabled",
"frame_rate": 24,
"custom_width": 512,
"custom_height": 512,
"frame_load_cap": 101,
"guidance_scale": 6,
"keep_proportion": true,
"denoise_strength": 0.85,
"select_every_nth": 1,
"skip_first_frames": 0
}
Input Parameters
- crf
- CRF value for output video quality (0-51). Lower values = better quality.
- seed
- Set a seed for reproducibility. Random by default.
- steps
- Number of sampling (denoising) steps.
- video (required)
- Input video file.
- width
- Output video width (divisible by 16 for best performance).
- height
- Output video height (divisible by 16 for best performance).
- prompt
- Text prompt describing the desired output video style. Be descriptive.
- flow_shift
- Flow shift for temporal consistency. Adjust to tweak video smoothness.
- force_rate
- Force a new frame rate on the input video. 0 means no change.
- force_size
- Force resize method. 'Disabled' means original size. Otherwise applies custom_width/height.
- frame_rate
- Frame rate of the output video.
- custom_width
- Custom width if force_size is not 'Disabled'.
- custom_height
- Custom height if force_size is not 'Disabled'.
- frame_load_cap
- Max frames to load from input video.
- guidance_scale
- Embedded guidance scale. Higher values follow the prompt more strictly.
- keep_proportion
- Keep aspect ratio when resizing. If true, will adjust dimensions proportionally.
- denoise_strength
- Denoise strength (0.0 to 1.0). Higher = more deviation from input content.
- select_every_nth
- Use every nth frame (1 = every frame, 2 = every second frame, etc.).
- skip_first_frames
- Number of initial frames to skip from the input video.
Output Schema
Output
Example Execution Logs
Checking inputs
✅ /tmp/inputs/input.mp4
====================================
Checking weights
✅ hunyuan_video_vae_bf16.safetensors exists in ComfyUI/models/vae
✅ hunyuan_video_720_fp8_e4m3fn.safetensors exists in ComfyUI/models/diffusion_models
====================================
Running workflow
[ComfyUI] got prompt
Executing node 43, title: Load Video (Upload) 🎥🅥🅗🅢, class type: VHS_LoadVideo
Executing node 42, title: Resize Image, class type: ImageResizeKJ
Executing node 39, title: Get Image Size & Count, class type: GetImageSizeAndCount
Executing node 7, title: HunyuanVideo VAE Loader, class type: HyVideoVAELoader
Executing node 38, title: HunyuanVideo Encode, class type: HyVideoEncode
Executing node 16, title: (Down)Load HunyuanVideo TextEncoder, class type: DownloadAndLoadHyVideoTextEncoder
[ComfyUI] Loading text encoder model (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14
[ComfyUI] Text encoder to dtype: torch.float16
[ComfyUI] Loading tokenizer (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14
[ComfyUI] Loading text encoder model (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
[ComfyUI] ColorMod: Can't find pypng! Please install to enable 16bit image support.
[ComfyUI] ColorMod: Ignoring node 'CV2TonemapDurand' due to cv2 edition/version
[ComfyUI] ------------------------------------------
[ComfyUI]
[ComfyUI] [34mComfyroll Studio v1.76 : [92m 175 Nodes Loaded[0m
[ComfyUI] ------------------------------------------
[ComfyUI] ** For changes, please see patch notes at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/blob/main/Patch_Notes.md
[ComfyUI] ** For help, please see the wiki at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/wiki
[ComfyUI] ------------------------------------------
[ComfyUI] [34mFizzleDorf Custom Nodes: [92mLoaded[0m
[ComfyUI] [92m[tinyterraNodes] [32mLoaded[0m
[ComfyUI] Please 'pip install xformers'
[ComfyUI] Nvidia APEX normalization not installed, using PyTorch LayerNorm
[ComfyUI] [0;33m[ReActor][0m - [38;5;173mSTATUS[0m - [0;32mRunning v0.5.2-a1 in ComfyUI[0m
[ComfyUI] Torch version: 2.5.1+cu124
[ComfyUI]
[ComfyUI] [36mEfficiency Nodes:[0m Attempting to add Control Net options to the 'HiRes-Fix Script' Node (comfyui_controlnet_aux add-on)...[92mSuccess![0m
[ComfyUI] [93mEfficiency Nodes Warning:[0m Failed to import python package 'simpleeval'; related nodes disabled.
[ComfyUI]
[ComfyUI]
[ComfyUI] [92m[rgthree-comfy] Loaded 42 fantastic nodes. 🎉[00m
[ComfyUI]
[ComfyUI] [34mWAS Node Suite: [0mOpenCV Python FFMPEG support is enabled[0m
[ComfyUI] [34mWAS Node Suite: [0m`ffmpeg_bin_path` is set to: /usr/bin/ffmpeg[0m
[ComfyUI] [34mWAS Node Suite: [0mFinished.[0m [32mLoaded[0m [0m218[0m [32mnodes successfully.[0m
[ComfyUI] encoded latents shape torch.Size([1, 16, 26, 52, 96])
[ComfyUI] Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
[ComfyUI] Loading checkpoint shards: 25%|██▌ | 1/4 [00:00<00:02, 1.46it/s]
[ComfyUI] Loading checkpoint shards: 50%|█████ | 2/4 [00:01<00:01, 1.48it/s]
[ComfyUI] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:02<00:00, 1.49it/s]
[ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00, 2.10it/s]
[ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00, 1.82it/s]
[ComfyUI] Text encoder to dtype: torch.float16
[ComfyUI] Loading tokenizer (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
Executing node 30, title: HunyuanVideo TextEncode, class type: HyVideoTextEncode
[ComfyUI] llm prompt attention_mask shape: torch.Size([1, 161]), masked tokens: 19
[ComfyUI] clipL prompt attention_mask shape: torch.Size([1, 77]), masked tokens: 20
Executing node 1, title: HunyuanVideo Model Loader, class type: HyVideoModelLoader
[ComfyUI] Using accelerate to load and assign model weights to device...
[ComfyUI] Input (height, width, video_length) = (416, 768, 101)
Executing node 3, title: HunyuanVideo Sampler, class type: HyVideoSampler
[ComfyUI] Sampling 101 frames in 26 latents at 768x416 with 25 inference steps
[ComfyUI] Scheduler config: FrozenDict([('num_train_timesteps', 1000), ('shift', 9.0), ('reverse', True), ('solver', 'euler'), ('n_tokens', None), ('_use_default_values', ['num_train_timesteps', 'n_tokens'])])
[ComfyUI] tensor([978.2609, 972.9730, 967.2897, 961.1651, 954.5454, 947.3684, 939.5604, 931.0344, 921.6867, 911.3924, 900.0000, 887.3240, 873.1343, 857.1429, 838.9830, 818.1818, 794.1176, 765.9575, 732.5581, 692.3077, 642.8571, 580.6452, 500.0001, 391.3044, 236.8421], device='cuda:0')
[ComfyUI]
[ComfyUI] 0%| | 0/25 [00:00<?, ?it/s]
[ComfyUI] 4%|▍ | 1/25 [00:07<03:11, 7.99s/it]
[ComfyUI] 8%|▊ | 2/25 [00:16<03:05, 8.05s/it]
[ComfyUI] 12%|█▏ | 3/25 [00:24<02:57, 8.07s/it]
[ComfyUI] 16%|█▌ | 4/25 [00:32<02:49, 8.07s/it]
[ComfyUI] 20%|██ | 5/25 [00:40<02:41, 8.09s/it]
[ComfyUI] 24%|██▍ | 6/25 [00:48<02:33, 8.08s/it]
[ComfyUI] 28%|██▊ | 7/25 [00:56<02:25, 8.10s/it]
[ComfyUI] 32%|███▏ | 8/25 [01:04<02:17, 8.10s/it]
[ComfyUI] 36%|███▌ | 9/25 [01:12<02:10, 8.16s/it]
[ComfyUI] 40%|████ | 10/25 [01:21<02:02, 8.14s/it]
[ComfyUI] 44%|████▍ | 11/25 [01:29<01:53, 8.14s/it]
[ComfyUI] 48%|████▊ | 12/25 [01:37<01:45, 8.14s/it]
[ComfyUI] 52%|█████▏ | 13/25 [01:45<01:37, 8.14s/it]
[ComfyUI] 56%|█████▌ | 14/25 [01:53<01:29, 8.13s/it]
[ComfyUI] 60%|██████ | 15/25 [02:01<01:21, 8.12s/it]
[ComfyUI] 64%|██████▍ | 16/25 [02:09<01:13, 8.12s/it]
[ComfyUI] 68%|██████▊ | 17/25 [02:17<01:05, 8.13s/it]
[ComfyUI] 72%|███████▏ | 18/25 [02:26<00:56, 8.14s/it]
[ComfyUI] 76%|███████▌ | 19/25 [02:34<00:48, 8.13s/it]
[ComfyUI] 80%|████████ | 20/25 [02:42<00:40, 8.14s/it]
[ComfyUI] 84%|████████▍ | 21/25 [02:50<00:32, 8.13s/it]
[ComfyUI] 88%|████████▊ | 22/25 [02:58<00:24, 8.13s/it]
[ComfyUI] 92%|█████████▏| 23/25 [03:06<00:16, 8.12s/it]
[ComfyUI] 96%|█████████▌| 24/25 [03:14<00:08, 8.13s/it]
[ComfyUI] 100%|██████████| 25/25 [03:22<00:00, 8.13s/it]
[ComfyUI] 100%|██████████| 25/25 [03:22<00:00, 8.12s/it]
[ComfyUI] Allocated memory: memory=12.306 GB
[ComfyUI] Max allocated memory: max_memory=20.619 GB
[ComfyUI] Max reserved memory: max_reserved=22.875 GB
Executing node 5, title: HunyuanVideo Decode, class type: HyVideoDecode
[ComfyUI]
[ComfyUI] Decoding rows: 0%| | 0/3 [00:00<?, ?it/s]
[ComfyUI] Decoding rows: 33%|███▎ | 1/3 [00:01<00:03, 1.51s/it]
[ComfyUI] Decoding rows: 67%|██████▋ | 2/3 [00:03<00:01, 1.56s/it]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:03<00:00, 1.04s/it]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:03<00:00, 1.18s/it]
[ComfyUI]
[ComfyUI] Blending tiles: 0%| | 0/3 [00:00<?, ?it/s]
[ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 58.36it/s]
[ComfyUI]
[ComfyUI] Decoding rows: 0%| | 0/3 [00:00<?, ?it/s]
[ComfyUI] Decoding rows: 33%|███▎ | 1/3 [00:01<00:02, 1.22s/it]
[ComfyUI] Decoding rows: 67%|██████▋ | 2/3 [00:02<00:01, 1.27s/it]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:02<00:00, 1.18it/s]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:02<00:00, 1.05it/s]
[ComfyUI]
[ComfyUI] Blending tiles: 0%| | 0/3 [00:00<?, ?it/s]
[ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 66.36it/s]
[ComfyUI]
[ComfyUI] Decoding rows: 0%| | 0/3 [00:00<?, ?it/s]
[ComfyUI] Decoding rows: 33%|███▎ | 1/3 [00:00<00:00, 7.03it/s]
[ComfyUI] Decoding rows: 67%|██████▋ | 2/3 [00:00<00:00, 7.17it/s]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:00<00:00, 9.50it/s]
[ComfyUI]
[ComfyUI] Blending tiles: 0%| | 0/3 [00:00<?, ?it/s]
Executing node 44, title: Image Concatenate Multi, class type: ImageConcatMulti
Executing node 53, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine
[ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 96.34it/s]
[ComfyUI] Prompt executed in 241.31 seconds
outputs: {'39': {'text': ['101x768x416']}, '53': {'gifs': [{'filename': 'HunhuyanVideo_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4', 'frame_rate': 24.0}]}}
====================================
HunhuyanVideo_00001.png
HunhuyanVideo_00001.mp4
Version Details
- Version ID
d550f226f28b1030c2fedd2947f39f19b4b0233b50364904538caaf037fb18d3- Version Created
- December 11, 2024