zsxkib/hunyuan-video2video 🔢🖼️📝✓ → 🖼️

▶️ 3.1K runs 📅 Dec 2024 ⚙️ Cog 0.13.6 📄 Paper ⚖️ License

video-editing video-style-transfer video-to-video

Performance

241.7sTypical run time

3.1KTotal runs

About

A state-of-the-art text-to-video generation model capable of creating high-quality videos with realistic motion from text descriptions

Example Output

Prompt:

"high quality nature video of a excited Bengal Tiger walking through the grass, masterpiece, best quality"

Output

Performance Metrics

241.75s Prediction Time

322.67s Total Time

All Input Parameters

{
  "crf": 19,
  "steps": 30,
  "video": "https://replicate.delivery/pbxt/M5n5MuDgBxhSERj6PvHgz4BJcOUdHc9o1ZBXz454GoGP5DrR/2024-12-03-18%3A25%3A47_seed47039_A%20cat%20walks%20on%20the%20grass%2C%20realistic%20style..mp4",
  "width": 768,
  "height": 768,
  "prompt": "high quality nature video of a excited Bengal Tiger walking through the grass, masterpiece, best quality",
  "flow_shift": 9,
  "force_rate": 0,
  "force_size": "Disabled",
  "frame_rate": 24,
  "custom_width": 512,
  "custom_height": 512,
  "frame_load_cap": 101,
  "guidance_scale": 6,
  "keep_proportion": true,
  "denoise_strength": 0.85,
  "select_every_nth": 1,
  "skip_first_frames": 0
}

Input Parameters

crf Type: integerDefault: 19Range: 0 - 51: CRF value for output video quality (0-51). Lower values = better quality.
seed Type: integer: Set a seed for reproducibility. Random by default.
steps Type: integerDefault: 30Range: 1 - 150: Number of sampling (denoising) steps.
video (required) Type: string: Input video file.
width Type: integerDefault: 768Range: 64 - 2048: Output video width (divisible by 16 for best performance).
height Type: integerDefault: 768Range: 64 - 2048: Output video height (divisible by 16 for best performance).
prompt Type: stringDefault: high quality nature video of a excited brown bear walking through the grass, masterpiece, best quality: Text prompt describing the desired output video style. Be descriptive.
flow_shift Type: integerDefault: 9Range: 1 - 20: Flow shift for temporal consistency. Adjust to tweak video smoothness.
force_rate Type: integerDefault: 0Range: 0 - 240: Force a new frame rate on the input video. 0 means no change.
force_size Type: stringDefault: Disabled: Force resize method. 'Disabled' means original size. Otherwise applies custom_width/height.
frame_rate Type: integerDefault: 24Range: 1 - 120: Frame rate of the output video.
custom_width Type: integerDefault: 512Range: 64 - 2048: Custom width if force_size is not 'Disabled'.
custom_height Type: integerDefault: 512Range: 64 - 2048: Custom height if force_size is not 'Disabled'.
frame_load_cap Type: integerDefault: 101Range: 1 - ∞: Max frames to load from input video.
guidance_scale Type: numberDefault: 6Range: 1 - 20: Embedded guidance scale. Higher values follow the prompt more strictly.
keep_proportion Type: booleanDefault: true: Keep aspect ratio when resizing. If true, will adjust dimensions proportionally.
denoise_strength Type: numberDefault: 0.85Range: 0 - 1: Denoise strength (0.0 to 1.0). Higher = more deviation from input content.
select_every_nth Type: integerDefault: 1Range: 1 - ∞: Use every nth frame (1 = every frame, 2 = every second frame, etc.).
skip_first_frames Type: integerDefault: 0Range: 0 - ∞: Number of initial frames to skip from the input video.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Checking inputs
✅ /tmp/inputs/input.mp4
====================================
Checking weights
✅ hunyuan_video_vae_bf16.safetensors exists in ComfyUI/models/vae
✅ hunyuan_video_720_fp8_e4m3fn.safetensors exists in ComfyUI/models/diffusion_models
====================================
Running workflow
[ComfyUI] got prompt
Executing node 43, title: Load Video (Upload) 🎥🅥🅗🅢, class type: VHS_LoadVideo
Executing node 42, title: Resize Image, class type: ImageResizeKJ
Executing node 39, title: Get Image Size & Count, class type: GetImageSizeAndCount
Executing node 7, title: HunyuanVideo VAE Loader, class type: HyVideoVAELoader
Executing node 38, title: HunyuanVideo Encode, class type: HyVideoEncode
Executing node 16, title: (Down)Load HunyuanVideo TextEncoder, class type: DownloadAndLoadHyVideoTextEncoder
[ComfyUI] Loading text encoder model (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14
[ComfyUI] Text encoder to dtype: torch.float16
[ComfyUI] Loading tokenizer (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14
[ComfyUI] Loading text encoder model (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
[ComfyUI] ColorMod: Can't find pypng! Please install to enable 16bit image support.
[ComfyUI] ColorMod: Ignoring node 'CV2TonemapDurand' due to cv2 edition/version
[ComfyUI] ------------------------------------------
[ComfyUI]
[ComfyUI] [34mComfyroll Studio v1.76 : [92m 175 Nodes Loaded[0m
[ComfyUI] ------------------------------------------
[ComfyUI] ** For changes, please see patch notes at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/blob/main/Patch_Notes.md
[ComfyUI] ** For help, please see the wiki at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/wiki
[ComfyUI] ------------------------------------------
[ComfyUI] [34mFizzleDorf Custom Nodes: [92mLoaded[0m
[ComfyUI] [92m[tinyterraNodes] [32mLoaded[0m
[ComfyUI] Please 'pip install xformers'
[ComfyUI] Nvidia APEX normalization not installed, using PyTorch LayerNorm
[ComfyUI] [0;33m[ReActor][0m - [38;5;173mSTATUS[0m - [0;32mRunning v0.5.2-a1 in ComfyUI[0m
[ComfyUI] Torch version: 2.5.1+cu124
[ComfyUI]
[ComfyUI] [36mEfficiency Nodes:[0m Attempting to add Control Net options to the 'HiRes-Fix Script' Node (comfyui_controlnet_aux add-on)...[92mSuccess![0m
[ComfyUI] [93mEfficiency Nodes Warning:[0m Failed to import python package 'simpleeval'; related nodes disabled.
[ComfyUI]
[ComfyUI]
[ComfyUI] [92m[rgthree-comfy] Loaded 42 fantastic nodes. 🎉[00m
[ComfyUI]
[ComfyUI] [34mWAS Node Suite: [0mOpenCV Python FFMPEG support is enabled[0m
[ComfyUI] [34mWAS Node Suite: [0m`ffmpeg_bin_path` is set to: /usr/bin/ffmpeg[0m
[ComfyUI] [34mWAS Node Suite: [0mFinished.[0m [32mLoaded[0m [0m218[0m [32mnodes successfully.[0m
[ComfyUI] encoded latents shape torch.Size([1, 16, 26, 52, 96])
[ComfyUI] Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
[ComfyUI] Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:02,  1.46it/s]
[ComfyUI] Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.48it/s]
[ComfyUI] Loading checkpoint shards:  75%|███████▌  | 3/4 [00:02<00:00,  1.49it/s]
[ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  2.10it/s]
[ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.82it/s]
[ComfyUI] Text encoder to dtype: torch.float16
[ComfyUI] Loading tokenizer (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
Executing node 30, title: HunyuanVideo TextEncode, class type: HyVideoTextEncode
[ComfyUI] llm prompt attention_mask shape: torch.Size([1, 161]), masked tokens: 19
[ComfyUI] clipL prompt attention_mask shape: torch.Size([1, 77]), masked tokens: 20
Executing node 1, title: HunyuanVideo Model Loader, class type: HyVideoModelLoader
[ComfyUI] Using accelerate to load and assign model weights to device...
[ComfyUI] Input (height, width, video_length) = (416, 768, 101)
Executing node 3, title: HunyuanVideo Sampler, class type: HyVideoSampler
[ComfyUI] Sampling 101 frames in 26 latents at 768x416 with 25 inference steps
[ComfyUI] Scheduler config: FrozenDict([('num_train_timesteps', 1000), ('shift', 9.0), ('reverse', True), ('solver', 'euler'), ('n_tokens', None), ('_use_default_values', ['num_train_timesteps', 'n_tokens'])])
[ComfyUI] tensor([978.2609, 972.9730, 967.2897, 961.1651, 954.5454, 947.3684, 939.5604, 931.0344, 921.6867, 911.3924, 900.0000, 887.3240, 873.1343, 857.1429, 838.9830, 818.1818, 794.1176, 765.9575, 732.5581, 692.3077, 642.8571, 580.6452, 500.0001, 391.3044, 236.8421], device='cuda:0')
[ComfyUI]
[ComfyUI] 0%|          | 0/25 [00:00<?, ?it/s]
[ComfyUI] 4%|▍         | 1/25 [00:07<03:11,  7.99s/it]
[ComfyUI] 8%|▊         | 2/25 [00:16<03:05,  8.05s/it]
[ComfyUI] 12%|█▏        | 3/25 [00:24<02:57,  8.07s/it]
[ComfyUI] 16%|█▌        | 4/25 [00:32<02:49,  8.07s/it]
[ComfyUI] 20%|██        | 5/25 [00:40<02:41,  8.09s/it]
[ComfyUI] 24%|██▍       | 6/25 [00:48<02:33,  8.08s/it]
[ComfyUI] 28%|██▊       | 7/25 [00:56<02:25,  8.10s/it]
[ComfyUI] 32%|███▏      | 8/25 [01:04<02:17,  8.10s/it]
[ComfyUI] 36%|███▌      | 9/25 [01:12<02:10,  8.16s/it]
[ComfyUI] 40%|████      | 10/25 [01:21<02:02,  8.14s/it]
[ComfyUI] 44%|████▍     | 11/25 [01:29<01:53,  8.14s/it]
[ComfyUI] 48%|████▊     | 12/25 [01:37<01:45,  8.14s/it]
[ComfyUI] 52%|█████▏    | 13/25 [01:45<01:37,  8.14s/it]
[ComfyUI] 56%|█████▌    | 14/25 [01:53<01:29,  8.13s/it]
[ComfyUI] 60%|██████    | 15/25 [02:01<01:21,  8.12s/it]
[ComfyUI] 64%|██████▍   | 16/25 [02:09<01:13,  8.12s/it]
[ComfyUI] 68%|██████▊   | 17/25 [02:17<01:05,  8.13s/it]
[ComfyUI] 72%|███████▏  | 18/25 [02:26<00:56,  8.14s/it]
[ComfyUI] 76%|███████▌  | 19/25 [02:34<00:48,  8.13s/it]
[ComfyUI] 80%|████████  | 20/25 [02:42<00:40,  8.14s/it]
[ComfyUI] 84%|████████▍ | 21/25 [02:50<00:32,  8.13s/it]
[ComfyUI] 88%|████████▊ | 22/25 [02:58<00:24,  8.13s/it]
[ComfyUI] 92%|█████████▏| 23/25 [03:06<00:16,  8.12s/it]
[ComfyUI] 96%|█████████▌| 24/25 [03:14<00:08,  8.13s/it]
[ComfyUI] 100%|██████████| 25/25 [03:22<00:00,  8.13s/it]
[ComfyUI] 100%|██████████| 25/25 [03:22<00:00,  8.12s/it]
[ComfyUI] Allocated memory: memory=12.306 GB
[ComfyUI] Max allocated memory: max_memory=20.619 GB
[ComfyUI] Max reserved memory: max_reserved=22.875 GB
Executing node 5, title: HunyuanVideo Decode, class type: HyVideoDecode
[ComfyUI]
[ComfyUI] Decoding rows:   0%|          | 0/3 [00:00<?, ?it/s]
[ComfyUI] Decoding rows:  33%|███▎      | 1/3 [00:01<00:03,  1.51s/it]
[ComfyUI] Decoding rows:  67%|██████▋   | 2/3 [00:03<00:01,  1.56s/it]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:03<00:00,  1.04s/it]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:03<00:00,  1.18s/it]
[ComfyUI]
[ComfyUI] Blending tiles:   0%|          | 0/3 [00:00<?, ?it/s]
[ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 58.36it/s]
[ComfyUI]
[ComfyUI] Decoding rows:   0%|          | 0/3 [00:00<?, ?it/s]
[ComfyUI] Decoding rows:  33%|███▎      | 1/3 [00:01<00:02,  1.22s/it]
[ComfyUI] Decoding rows:  67%|██████▋   | 2/3 [00:02<00:01,  1.27s/it]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:02<00:00,  1.18it/s]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:02<00:00,  1.05it/s]
[ComfyUI]
[ComfyUI] Blending tiles:   0%|          | 0/3 [00:00<?, ?it/s]
[ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 66.36it/s]
[ComfyUI]
[ComfyUI] Decoding rows:   0%|          | 0/3 [00:00<?, ?it/s]
[ComfyUI] Decoding rows:  33%|███▎      | 1/3 [00:00<00:00,  7.03it/s]
[ComfyUI] Decoding rows:  67%|██████▋   | 2/3 [00:00<00:00,  7.17it/s]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:00<00:00,  9.50it/s]
[ComfyUI]
[ComfyUI] Blending tiles:   0%|          | 0/3 [00:00<?, ?it/s]
Executing node 44, title: Image Concatenate Multi, class type: ImageConcatMulti
Executing node 53, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine
[ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 96.34it/s]
[ComfyUI] Prompt executed in 241.31 seconds
outputs:  {'39': {'text': ['101x768x416']}, '53': {'gifs': [{'filename': 'HunhuyanVideo_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4', 'frame_rate': 24.0}]}}
====================================
HunhuyanVideo_00001.png
HunhuyanVideo_00001.mp4

Version Details

Version ID: d550f226f28b1030c2fedd2947f39f19b4b0233b50364904538caaf037fb18d3
Version Created: December 11, 2024

Run on Replicate →