zsxkib/hunyuan-video-lora 🔢📝❓✓🖼️ → 🖼️

▶️ 42.8K runs 📅 Dec 2024 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License
text-to-video video-lora-training

About

Hunyuan-Video LoRA Explorer + Trainer

Example Output

Prompt:

"In the style of RSNG. A woman with blonde hair stands on a balcony at night, framed against a backdrop of city lights. She wears a white crop top and a dark jacket, exuding a confident presence as she gazes directly at the camera"

Output

Performance Metrics

94.37s Prediction Time
123.20s Total Time
All Input Parameters
{
  "crf": 19,
  "steps": 30,
  "width": 512,
  "height": 512,
  "prompt": "In the style of RSNG. A woman with blonde hair stands on a balcony at night, framed against a backdrop of city lights. She wears a white crop top and a dark jacket, exuding a confident presence as she gazes directly at the camera",
  "lora_url": "lucataco/hunyuan-musubi-rose-6",
  "scheduler": "DPMSolverMultistepScheduler",
  "flow_shift": 9,
  "frame_rate": 15,
  "num_frames": 33,
  "enhance_end": 1,
  "enhance_start": 0,
  "force_offload": true,
  "lora_strength": 1,
  "enhance_double": true,
  "enhance_single": true,
  "enhance_weight": 0.3,
  "guidance_scale": 6,
  "denoise_strength": 1
}
Input Parameters
crf Type: integerDefault: 19Range: 0 - 51
CRF (quality) for H264 encoding. Lower values = higher quality.
seed Type: integer
Set a seed for reproducibility. Random by default.
steps Type: integerDefault: 50Range: 1 - 150
Number of diffusion steps.
width Type: integerDefault: 640Range: 64 - 1536
Width for the generated video.
height Type: integerDefault: 360Range: 64 - 1024
Height for the generated video.
prompt Type: stringDefault:
The text prompt describing your video scene.
lora_url Type: stringDefault:
A URL pointing to your LoRA .safetensors file or a Hugging Face repo (e.g. 'user/repo' - uses the first .safetensors file).
scheduler Default: DPMSolverMultistepScheduler
Algorithm used to generate the video frames.
flow_shift Type: integerDefault: 9Range: 0 - 20
Video continuity factor (flow).
frame_rate Type: integerDefault: 16Range: 1 - 60
Video frame rate.
num_frames Type: integerDefault: 33Range: 1 - 1440
How many frames (duration) in the resulting video.
enhance_end Type: numberDefault: 1Range: 0 - 1
When to end enhancement in the video. Must be greater than enhance_start.
enhance_start Type: numberDefault: 0Range: 0 - 1
When to start enhancement in the video. Must be less than enhance_end.
force_offload Type: booleanDefault: true
Whether to force model layers offloaded to CPU.
lora_strength Type: numberDefault: 1Range: -10 - 10
Scale/strength for your LoRA.
enhance_double Type: booleanDefault: true
Apply enhancement across frame pairs.
enhance_single Type: booleanDefault: true
Apply enhancement to individual frames.
enhance_weight Type: numberDefault: 0.3Range: 0 - 2
Strength of the video enhancement effect.
guidance_scale Type: numberDefault: 6Range: 0 - 30
Overall influence of text vs. model.
denoise_strength Type: numberDefault: 1Range: 0 - 2
Controls how strongly noise is applied each step.
replicate_weights Type: string
A .tar file containing LoRA weights from replicate.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Random seed set to: 2755017607
Checking inputs
====================================
Checking weights
✅ hunyuan_video_vae_bf16.safetensors exists in ComfyUI/models/vae
✅ hunyuan_video_720_fp8_e4m3fn.safetensors exists in ComfyUI/models/diffusion_models
====================================
Running workflow
[ComfyUI] got prompt
Executing node 7, title: HunyuanVideo VAE Loader, class type: HyVideoVAELoader
Executing node 42, title: HunyuanVideo Enhance A Video, class type: HyVideoEnhanceAVideo
Executing node 16, title: (Down)Load HunyuanVideo TextEncoder, class type: DownloadAndLoadHyVideoTextEncoder
[ComfyUI] Loading text encoder model (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14
[ComfyUI] Text encoder to dtype: torch.float16
[ComfyUI] Loading tokenizer (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14
[ComfyUI] Loading text encoder model (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
[ComfyUI]
[ComfyUI] Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
[ComfyUI] Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:01,  1.78it/s]
[ComfyUI] Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.75it/s]
[ComfyUI] Loading checkpoint shards:  75%|███████▌  | 3/4 [00:01<00:00,  1.76it/s]
[ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  2.58it/s]
[ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  2.21it/s]
[ComfyUI] Text encoder to dtype: torch.float16
[ComfyUI] Loading tokenizer (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
Executing node 30, title: HunyuanVideo TextEncode, class type: HyVideoTextEncode
[ComfyUI] llm prompt attention_mask shape: torch.Size([1, 161]), masked tokens: 52
[ComfyUI] clipL prompt attention_mask shape: torch.Size([1, 77]), masked tokens: 54
Executing node 41, title: HunyuanVideo Lora Select, class type: HyVideoLoraSelect
Executing node 1, title: HunyuanVideo Model Loader, class type: HyVideoModelLoader
[ComfyUI] model_type FLOW
[ComfyUI] The config attributes {'use_flow_sigmas': True, 'prediction_type': 'flow_prediction'} were passed to FlowMatchDiscreteScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.
[ComfyUI] Using accelerate to load and assign model weights to device...
[ComfyUI] Loading LoRA: lora with strength: 1.0
[ComfyUI] Requested to load HyVideoModel
[ComfyUI] loaded completely 9.5367431640625e+25 12555.953247070312 True
[ComfyUI] Input (height, width, video_length) = (512, 512, 33)
Executing node 3, title: HunyuanVideo Sampler, class type: HyVideoSampler
[ComfyUI] The config attributes {'reverse': True, 'solver': 'euler'} were passed to DPMSolverMultistepScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.
[ComfyUI] Sampling 33 frames in 9 latents at 512x512 with 30 inference steps
[ComfyUI] Scheduler config: FrozenDict([('num_train_timesteps', 1000), ('flow_shift', 9.0), ('reverse', True), ('solver', 'euler'), ('n_tokens', None), ('_use_default_values', ['n_tokens', 'num_train_timesteps'])])[ComfyUI]
[ComfyUI] 0%|          | 0/30 [00:00<?, ?it/s]
[ComfyUI] 3%|▎         | 1/30 [00:01<00:33,  1.16s/it]
[ComfyUI] 7%|▋         | 2/30 [00:01<00:26,  1.04it/s]
[ComfyUI] 10%|█         | 3/30 [00:03<00:26,  1.00it/s]
[ComfyUI] 13%|█▎        | 4/30 [00:04<00:26,  1.01s/it]
[ComfyUI] 17%|█▋        | 5/30 [00:05<00:25,  1.02s/it]
[ComfyUI] 20%|██        | 6/30 [00:06<00:24,  1.03s/it]
[ComfyUI] 23%|██▎       | 7/30 [00:07<00:23,  1.03s/it]
[ComfyUI] 27%|██▋       | 8/30 [00:08<00:22,  1.03s/it]
[ComfyUI] 30%|███       | 9/30 [00:09<00:21,  1.03s/it]
[ComfyUI] 33%|███▎      | 10/30 [00:10<00:20,  1.04s/it]
[ComfyUI] 37%|███▋      | 11/30 [00:11<00:19,  1.04s/it]
[ComfyUI] 40%|████      | 12/30 [00:12<00:18,  1.04s/it]
[ComfyUI] 43%|████▎     | 13/30 [00:13<00:18,  1.08s/it]
[ComfyUI] 47%|████▋     | 14/30 [00:14<00:17,  1.06s/it]
[ComfyUI] 50%|█████     | 15/30 [00:15<00:15,  1.06s/it]
[ComfyUI] 53%|█████▎    | 16/30 [00:16<00:14,  1.05s/it]
[ComfyUI] 57%|█████▋    | 17/30 [00:17<00:13,  1.05s/it]
[ComfyUI] 60%|██████    | 18/30 [00:18<00:12,  1.05s/it]
[ComfyUI] 63%|██████▎   | 19/30 [00:19<00:11,  1.04s/it]
[ComfyUI] 67%|██████▋   | 20/30 [00:20<00:10,  1.04s/it]
[ComfyUI] 70%|███████   | 21/30 [00:21<00:09,  1.04s/it]
[ComfyUI] 73%|███████▎  | 22/30 [00:22<00:08,  1.04s/it]
[ComfyUI] 77%|███████▋  | 23/30 [00:23<00:07,  1.04s/it]
[ComfyUI] 80%|████████  | 24/30 [00:24<00:06,  1.04s/it]
[ComfyUI] 83%|████████▎ | 25/30 [00:26<00:05,  1.04s/it]
[ComfyUI] 87%|████████▋ | 26/30 [00:27<00:04,  1.04s/it]
[ComfyUI] 90%|█████████ | 27/30 [00:28<00:03,  1.04s/it]
[ComfyUI] 93%|█████████▎| 28/30 [00:29<00:02,  1.04s/it]
[ComfyUI] 97%|█████████▋| 29/30 [00:30<00:01,  1.04s/it]
[ComfyUI] 100%|██████████| 30/30 [00:31<00:00,  1.04s/it]
[ComfyUI] 100%|██████████| 30/30 [00:31<00:00,  1.04s/it]
[ComfyUI] Allocated memory: memory=12.757 GB
[ComfyUI] Max allocated memory: max_memory=14.346 GB
[ComfyUI] Max reserved memory: max_reserved=14.812 GB
Executing node 5, title: HunyuanVideo Decode, class type: HyVideoDecode
[ComfyUI]
[ComfyUI] Decoding rows:   0%|          | 0/3 [00:00<?, ?it/s]
[ComfyUI] Decoding rows:  33%|███▎      | 1/3 [00:00<00:01,  1.80it/s]
[ComfyUI] Decoding rows:  67%|██████▋   | 2/3 [00:01<00:00,  1.74it/s]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:01<00:00,  2.00it/s]
[ComfyUI]
[ComfyUI] Blending tiles:   0%|          | 0/3 [00:00<?, ?it/s]
Executing node 34, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine
[ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 51.92it/s]
[ComfyUI] Prompt executed in 62.51 seconds
outputs:  {'34': {'gifs': [{'filename': 'HunyuanVideo_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4', 'frame_rate': 15.0, 'workflow': 'HunyuanVideo_00001.png', 'fullpath': '/tmp/outputs/HunyuanVideo_00001.mp4'}]}}
====================================
HunyuanVideo_00001.png
HunyuanVideo_00001.mp4
Version Details
Version ID
0e946318f53ed9d89a75cc48c6697a696f2b5e8981e74507a76ab557a938783d
Version Created
January 24, 2025
Run on Replicate →