camenduru/magic-dance 🔢🖼️❓ → 🖼️
About
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Example Output
Output
Performance Metrics
237.37s
Prediction Time
482.11s
Total Time
All Input Parameters
{ "seed": 4, "image_path": "https://replicate.delivery/pbxt/KPwpjZxbmaIpIwWJjPI9TT5htDT8PuqKIKuekCvZFs98sH2e/CMQVuHLVEAEEHxw.jpg", "pose_sequence": "001", "num_train_steps": 1 }
Input Parameters
- seed
- image_path (required)
- Input Image
- pose_sequence
- num_train_steps
Output Schema
Output
Example Execution Logs
/tmp/tmpolwkshiqCMQVuHLVEAEEHxw.jpg 1 4 001 ['CUDA_VISIBLE_DEVICES=0', 'torchrun', '--master_port', '18102', 'test_any_image_pose.py', '--model_config', '/content/MagicDance/model_lib/ControlNet/models/cldm_v15_reference_only_pose.yaml', '--num_train_steps', '1', '--seed', '4', '--img_bin_limit', 'all', '--train_batch_size', '1', '--use_fp16', '--control_mode', 'controlnet_important', '--control_type', 'body+hand+face', '--train_dataset', 'tiktok_video_arnold', '--v4', '--with_text', '--wonoise', '--image_pretrain_dir', '/content/MagicDance/pretrained_weights/model_state-110000.th', '--init_path', '/content/MagicDance/pretrained_weights/control_sd15_ini.ckpt', '--local_image_dir', '/content/MagicDance/tiktok_test_log/image_log/0125/001/image', '--local_log_dir', '/content/MagicDance/tiktok_test_log/tb_log/0125/001/log', '--local_pose_path', '/content/MagicDance/example_data/pose_sequence/001', '--local_cond_image_path', '/tmp/tmpolwkshiqCMQVuHLVEAEEHxw.jpg'] Moving 0 files to the new cache system TORCH_VERSION=2 FP16_DTYPE=torch.bfloat16 Customize text prompt: None Namespace(model_config='/content/MagicDance/model_lib/ControlNet/models/cldm_v15_reference_only_pose.yaml', reinit_hint_block=False, image_size=64, empty_text_prob=0.1, sd_locked=True, only_mid_control=False, finetune_all=False, finetune_imagecond_unet=False, control_type=['body+hand+face'], control_dropout=0.0, depth_bg_threshold=0.0, inpaint_unet=False, blank_mask_prob=0.0, mask_densepose=0.0, control_mode='controlnet_important', wonoise=True, mask_bg=False, img_bin_limit='all', num_workers=1, train_batch_size=1, val_batch_size=1, lr=1e-05, lr_sd=1e-05, weight_decay=0, lr_anneal_steps=0, ema_rate=0, num_train_steps=1, grad_clip_norm=0.5, gradient_accumulation_steps=1, seed=4, logging_steps=100, logging_gen_steps=1000, save_steps=10000, save_total_limit=100, use_fp16=True, global_step=0, load_optimizer_state=True, compile=False, with_text=False, pose_transfer=False, eta=0.0, autoreg=False, gif_time=0.03, text_prompt=None, v4=True, train_dataset='tiktok_video_arnold', output_dir=None, local_log_dir='/content/MagicDance/tiktok_test_log/tb_log/0125/001/log', local_image_dir='/content/MagicDance/tiktok_test_log/image_log/0125/001/image', resume_dir=None, image_pretrain_dir='/content/MagicDance/pretrained_weights/model_state-110000.th', pose_pretrain_dir=None, init_path='/content/MagicDance/pretrained_weights/control_sd15_ini.ckpt', local_cond_image_path='/tmp/tmpolwkshiqCMQVuHLVEAEEHxw.jpg', local_pose_path='/content/MagicDance/example_data/pose_sequence/001', world_size=1, local_rank=0, rank=0, device=device(type='cuda', index=0)) ControlLDMReferenceOnlyPose: Running in eps-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. DiffusionWrapper has 859.52 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... () {'linear_start': 0.00085, 'linear_end': 0.012, 'num_timesteps_cond': 1, 'log_every_t': 200, 'timesteps': 1000, 'first_stage_key': 'jpg', 'cond_stage_key': 'txt', 'image_size': 64, 'channels': 4, 'cond_stage_trainable': False, 'conditioning_key': 'crossattn', 'monitor': 'val/loss_simple_ema', 'scale_factor': 0.18215, 'use_ema': False, 'unet_config': {'target': 'model_lib.ControlNet.cldm.cldm.ControlledUnetModelAttnPose', 'params': {'image_size': 32, 'in_channels': 4, 'out_channels': 4, 'model_channels': 320, 'attention_resolutions': [4, 2, 1], 'num_res_blocks': 2, 'channel_mult': [1, 2, 4, 4], 'num_heads': 8, 'use_spatial_transformer': True, 'transformer_depth': 1, 'context_dim': 768, 'use_checkpoint': True, 'legacy': False}}, 'first_stage_config': {'target': 'model_lib.ControlNet.ldm.models.autoencoder.AutoencoderKL', 'params': {'embed_dim': 4, 'monitor': 'val/rec_loss', 'ddconfig': {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0}, 'lossconfig': {'target': 'torch.nn.Identity'}}}, 'cond_stage_config': {'target': 'model_lib.ControlNet.ldm.modules.encoders.modules.FrozenCLIPEmbedder'}} Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads. Loaded model config from [/content/MagicDance/model_lib/ControlNet/models/cldm_v15_reference_only_pose.yaml] Total base parameters 2288.11M find model state dict from /content/MagicDance/pretrained_weights/model_state-110000.th ... Loading model state dict from /content/MagicDance/pretrained_weights/model_state-110000.th ... Max memory allocated after creating DDP: 17550.0MB [rank0] start training loop! train steps: 1 args: Namespace(model_config='/content/MagicDance/model_lib/ControlNet/models/cldm_v15_reference_only_pose.yaml', reinit_hint_block=False, image_size=64, empty_text_prob=0.1, sd_locked=True, only_mid_control=False, finetune_all=False, finetune_imagecond_unet=False, control_type=['body+hand+face'], control_dropout=0.0, depth_bg_threshold=0.0, inpaint_unet=False, blank_mask_prob=0.0, mask_densepose=0.0, control_mode='controlnet_important', wonoise=True, mask_bg=False, img_bin_limit='all', num_workers=1, train_batch_size=1, val_batch_size=1, lr=1e-05, lr_sd=1e-05, weight_decay=0, lr_anneal_steps=0, ema_rate=0, num_train_steps=1, grad_clip_norm=0.5, gradient_accumulation_steps=1, seed=4, logging_steps=100, logging_gen_steps=1000, save_steps=10000, save_total_limit=100, use_fp16=True, global_step=0, load_optimizer_state=True, compile=False, with_text=False, pose_transfer=False, eta=0.0, autoreg=False, gif_time=0.03, text_prompt=None, v4=True, train_dataset='tiktok_video_arnold', output_dir=None, local_log_dir='/content/MagicDance/tiktok_test_log/tb_log/0125/001/log', local_image_dir='/content/MagicDance/tiktok_test_log/image_log/0125/001/image', resume_dir=None, image_pretrain_dir='/content/MagicDance/pretrained_weights/model_state-110000.th', pose_pretrain_dir=None, init_path='/content/MagicDance/pretrained_weights/control_sd15_ini.ckpt', local_cond_image_path='/tmp/tmpolwkshiqCMQVuHLVEAEEHxw.jpg', local_pose_path='/content/MagicDance/example_data/pose_sequence/001', world_size=1, local_rank=0, rank=0, device=device(type='cuda', index=0)) c_cat_list len: 25 Generate Image 0 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 1 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 2 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 3 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 4 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 5 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 6 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 7 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 8 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 9 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 10 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 11 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 12 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 13 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 14 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 15 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 16 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 17 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 18 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 19 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 20 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 21 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 22 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 23 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps Generate Image 24 in 25 images Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0 Running DDIM Sampling with 50 timesteps
Version Details
- Version ID
a59027b2c4908585c82dce3a79763bb90dfa8512d758d97ca3818879768d745e
- Version Created
- February 16, 2024