lucataco/hunyuan-heygen-woman-2 🔢📝❓✓🖼️ → 🖼️

▶️ 253 runs 📅 Jan 2025 ⚙️ Cog 0.13.6

avatar-generation text-to-video video-consistent-character-generation

About

HunyuanVideo finetune of an AI avatar from Heygen

Example Output

Prompt:

"HGW2 woman sitting on a beige couch in a well-decorated room. She is wearing a light-colored, long-sleeved turtleneck top and has long, straight brown hair. The couch is adorned with several throw pillows, each with a black and white geometric pattern. The background includes a wooden chair with a yellow cushion, a wooden side table, and a large mirror with a wooden frame. The room has a warm and cozy atmosphere, with soft lighting and a comfortable ambiance. The woman appears to be speaking or presenting something, as she is looking directly at the camera with a neutral expression"

Output

Performance Metrics

208.05s Prediction Time

239.19s Total Time

All Input Parameters

{
  "crf": 19,
  "steps": 30,
  "width": 960,
  "height": 544,
  "prompt": "HGW2 woman sitting on a beige couch in a well-decorated room. She is wearing a light-colored, long-sleeved turtleneck top and has long, straight brown hair. The couch is adorned with several throw pillows, each with a black and white geometric pattern. The background includes a wooden chair with a yellow cushion, a wooden side table, and a large mirror with a wooden frame. The room has a warm and cozy atmosphere, with soft lighting and a comfortable ambiance. The woman appears to be speaking or presenting something, as she is looking directly at the camera with a neutral expression",
  "lora_url": "",
  "flow_shift": 9,
  "frame_rate": 20,
  "num_frames": 49,
  "force_offload": true,
  "lora_strength": 0.9,
  "guidance_scale": 6,
  "denoise_strength": 1
}

Input Parameters

crf Type: integerDefault: 19Range: 0 - 51: CRF (quality) for H264 encoding. Lower values = higher quality.
seed Type: integer: Set a seed for reproducibility. Random by default.
steps Type: integerDefault: 50Range: 1 - 150: Number of diffusion steps.
width Type: integerDefault: 640Range: 64 - 1536: Width for the generated video.
height Type: integerDefault: 360Range: 64 - 1024: Height for the generated video.
prompt Type: stringDefault:: The text prompt describing your video scene.
lora_url Type: stringDefault:: A URL pointing to your LoRA .safetensors file or a Hugging Face repo (e.g. 'user/repo' - uses the first .safetensors file).
scheduler Default: DPMSolverMultistepScheduler: Algorithm used to generate the video frames.
flow_shift Type: integerDefault: 9Range: 0 - 20: Video continuity factor (flow).
frame_rate Type: integerDefault: 16Range: 1 - 60: Video frame rate.
num_frames Type: integerDefault: 33Range: 1 - 1440: How many frames (duration) in the resulting video.
enhance_end Type: numberDefault: 1Range: 0 - 1: When to end enhancement in the video. Must be greater than enhance_start.
enhance_start Type: numberDefault: 0Range: 0 - 1: When to start enhancement in the video. Must be less than enhance_end.
force_offload Type: booleanDefault: true: Whether to force model layers offloaded to CPU.
lora_strength Type: numberDefault: 1Range: -10 - 10: Scale/strength for your LoRA.
enhance_double Type: booleanDefault: true: Apply enhancement across frame pairs.
enhance_single Type: booleanDefault: true: Apply enhancement to individual frames.
enhance_weight Type: numberDefault: 0.3Range: 0 - 2: Strength of the video enhancement effect.
guidance_scale Type: numberDefault: 6Range: 0 - 30: Overall influence of text vs. model.
denoise_strength Type: numberDefault: 1Range: 0 - 2: Controls how strongly noise is applied each step.
replicate_weights Type: string: A .tar file containing LoRA weights from replicate.

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Random seed set to: 3841941661
Checking inputs
====================================
Checking weights
✅ hunyuan_video_720_fp8_e4m3fn.safetensors exists in ComfyUI/models/diffusion_models
✅ hunyuan_video_vae_bf16.safetensors exists in ComfyUI/models/vae
====================================
Running workflow
[ComfyUI] got prompt
Executing node 7, title: HunyuanVideo VAE Loader, class type: HyVideoVAELoader
[ComfyUI] Loading text encoder model (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14
Executing node 16, title: (Down)Load HunyuanVideo TextEncoder, class type: DownloadAndLoadHyVideoTextEncoder
[ComfyUI] Text encoder to dtype: torch.float16
[ComfyUI] Loading tokenizer (clipL) from: /src/ComfyUI/models/clip/clip-vit-large-patch14
[ComfyUI] Loading text encoder model (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
[ComfyUI]
[ComfyUI] Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
[ComfyUI] Loading checkpoint shards:  25%|██▌       | 1/4 [00:00<00:02,  1.43it/s]
[ComfyUI] Loading checkpoint shards:  50%|█████     | 2/4 [00:01<00:01,  1.23it/s]
[ComfyUI] Loading checkpoint shards:  75%|███████▌  | 3/4 [00:02<00:00,  1.13it/s]
[ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.57it/s]
[ComfyUI] Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.42it/s]
[ComfyUI] Text encoder to dtype: torch.float16
[ComfyUI] Loading tokenizer (llm) from: /src/ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
Executing node 30, title: HunyuanVideo TextEncode, class type: HyVideoTextEncode
[ComfyUI] llm prompt attention_mask shape: torch.Size([1, 161]), masked tokens: 123
[ComfyUI] clipL prompt attention_mask shape: torch.Size([1, 77]), masked tokens: 77
Executing node 41, title: HunyuanVideo Lora Select, class type: HyVideoLoraSelect
Executing node 1, title: HunyuanVideo Model Loader, class type: HyVideoModelLoader
[ComfyUI] model_type FLOW
[ComfyUI] Using accelerate to load and assign model weights to device...
[ComfyUI] Loading LoRA: lora_comfyui with strength: 0.9
[ComfyUI] Requested to load HyVideoModel
[ComfyUI] Loading 1 new model
[ComfyUI] loaded completely 0.0 12555.953247070312 True
[ComfyUI] Input (height, width, video_length) = (544, 960, 49)
Executing node 3, title: HunyuanVideo Sampler, class type: HyVideoSampler
[ComfyUI] Sampling 49 frames in 13 latents at 960x544 with 30 inference steps
[ComfyUI] Scheduler config: FrozenDict([('num_train_timesteps', 1000), ('shift', 9.0), ('reverse', True), ('solver', 'euler'), ('n_tokens', None), ('_use_default_values', ['num_train_timesteps', 'n_tokens'])])[ComfyUI]
[ComfyUI] 0%|          | 0/30 [00:00<?, ?it/s]
[ComfyUI] 3%|▎         | 1/30 [00:04<01:58,  4.09s/it]
[ComfyUI] 7%|▋         | 2/30 [00:09<02:17,  4.92s/it]
[ComfyUI] 10%|█         | 3/30 [00:15<02:19,  5.18s/it]
[ComfyUI] 13%|█▎        | 4/30 [00:20<02:17,  5.31s/it]
[ComfyUI] 17%|█▋        | 5/30 [00:26<02:14,  5.38s/it]
[ComfyUI] 20%|██        | 6/30 [00:31<02:10,  5.42s/it]
[ComfyUI] 23%|██▎       | 7/30 [00:37<02:05,  5.45s/it]
[ComfyUI] 27%|██▋       | 8/30 [00:42<02:00,  5.46s/it]
[ComfyUI] 30%|███       | 9/30 [00:48<01:55,  5.48s/it]
[ComfyUI] 33%|███▎      | 10/30 [00:53<01:49,  5.48s/it]
[ComfyUI] 37%|███▋      | 11/30 [00:59<01:44,  5.49s/it]
[ComfyUI] 40%|████      | 12/30 [01:04<01:38,  5.49s/it]
[ComfyUI] 43%|████▎     | 13/30 [01:10<01:33,  5.50s/it]
[ComfyUI] 47%|████▋     | 14/30 [01:15<01:27,  5.50s/it]
[ComfyUI] 50%|█████     | 15/30 [01:21<01:22,  5.50s/it]
[ComfyUI] 53%|█████▎    | 16/30 [01:26<01:17,  5.50s/it]
[ComfyUI] 57%|█████▋    | 17/30 [01:32<01:11,  5.51s/it]
[ComfyUI] 60%|██████    | 18/30 [01:37<01:06,  5.50s/it]
[ComfyUI] 63%|██████▎   | 19/30 [01:43<01:00,  5.50s/it]
[ComfyUI] 67%|██████▋   | 20/30 [01:48<00:55,  5.51s/it]
[ComfyUI] 70%|███████   | 21/30 [01:54<00:49,  5.50s/it]
[ComfyUI] 73%|███████▎  | 22/30 [01:59<00:44,  5.50s/it]
[ComfyUI] 77%|███████▋  | 23/30 [02:05<00:38,  5.51s/it]
[ComfyUI] 80%|████████  | 24/30 [02:10<00:33,  5.51s/it]
[ComfyUI] 83%|████████▎ | 25/30 [02:16<00:27,  5.51s/it]
[ComfyUI] 87%|████████▋ | 26/30 [02:21<00:22,  5.50s/it]
[ComfyUI] 90%|█████████ | 27/30 [02:27<00:16,  5.50s/it]
[ComfyUI] 93%|█████████▎| 28/30 [02:32<00:11,  5.50s/it]
[ComfyUI] 97%|█████████▋| 29/30 [02:38<00:05,  5.50s/it]
[ComfyUI] 100%|██████████| 30/30 [02:43<00:00,  5.50s/it]
[ComfyUI] 100%|██████████| 30/30 [02:43<00:00,  5.46s/it]
[ComfyUI] Allocated memory: memory=12.760 GB
[ComfyUI] Max allocated memory: max_memory=18.839 GB
[ComfyUI] Max reserved memory: max_reserved=20.719 GB
Executing node 5, title: HunyuanVideo Decode, class type: HyVideoDecode
[ComfyUI]
[ComfyUI] Decoding rows:   0%|          | 0/3 [00:00<?, ?it/s]
[ComfyUI] Decoding rows:  33%|███▎      | 1/3 [00:01<00:03,  1.51s/it]
[ComfyUI] Decoding rows:  67%|██████▋   | 2/3 [00:03<00:01,  1.62s/it]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:04<00:00,  1.40s/it]
[ComfyUI] Decoding rows: 100%|██████████| 3/3 [00:04<00:00,  1.45s/it]
[ComfyUI]
[ComfyUI] Blending tiles:   0%|          | 0/3 [00:00<?, ?it/s]
[ComfyUI] Blending tiles:  33%|███▎      | 1/3 [00:00<00:00,  7.82it/s]
Executing node 34, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine
[ComfyUI] Blending tiles: 100%|██████████| 3/3 [00:00<00:00, 18.19it/s]
[ComfyUI] Prompt executed in 201.71 seconds
outputs:  {'34': {'gifs': [{'filename': 'HunyuanVideo_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4', 'frame_rate': 20.0, 'workflow': 'HunyuanVideo_00001.png', 'fullpath': '/tmp/outputs/HunyuanVideo_00001.mp4'}]}}
====================================
HunyuanVideo_00001.png
HunyuanVideo_00001.mp4

Version Details

Version ID: 923b4f49c7a0a882abb89494ac38fd902ef60640f742eb478bae94f225439ab4
Version Created: January 13, 2025

Run on Replicate →