camenduru/magic-dance 🔢🖼️❓ → 🖼️

▶️ 667 runs 📅 Feb 2024 ⚙️ Cog 0.9.4 🔗 GitHub 📄 Paper ⚖️ License
dance-video facial-expression-transfer motion-transfer pose-sequence video-generation

About

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

Example Output

Output

Performance Metrics

237.37s Prediction Time
482.11s Total Time
All Input Parameters
{
  "seed": 4,
  "image_path": "https://replicate.delivery/pbxt/KPwpjZxbmaIpIwWJjPI9TT5htDT8PuqKIKuekCvZFs98sH2e/CMQVuHLVEAEEHxw.jpg",
  "pose_sequence": "001",
  "num_train_steps": 1
}
Input Parameters
seed Type: integerDefault: 42
image_path (required) Type: string
Input Image
pose_sequence Default: 001
num_train_steps Type: integerDefault: 1Range: 1 - 5
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
/tmp/tmpolwkshiqCMQVuHLVEAEEHxw.jpg 1 4 001
['CUDA_VISIBLE_DEVICES=0', 'torchrun', '--master_port', '18102', 'test_any_image_pose.py', '--model_config', '/content/MagicDance/model_lib/ControlNet/models/cldm_v15_reference_only_pose.yaml', '--num_train_steps', '1', '--seed', '4', '--img_bin_limit', 'all', '--train_batch_size', '1', '--use_fp16', '--control_mode', 'controlnet_important', '--control_type', 'body+hand+face', '--train_dataset', 'tiktok_video_arnold', '--v4', '--with_text', '--wonoise', '--image_pretrain_dir', '/content/MagicDance/pretrained_weights/model_state-110000.th', '--init_path', '/content/MagicDance/pretrained_weights/control_sd15_ini.ckpt', '--local_image_dir', '/content/MagicDance/tiktok_test_log/image_log/0125/001/image', '--local_log_dir', '/content/MagicDance/tiktok_test_log/tb_log/0125/001/log', '--local_pose_path', '/content/MagicDance/example_data/pose_sequence/001', '--local_cond_image_path', '/tmp/tmpolwkshiqCMQVuHLVEAEEHxw.jpg'] Moving 0 files to the new cache system
TORCH_VERSION=2 FP16_DTYPE=torch.bfloat16
Customize text prompt: None
Namespace(model_config='/content/MagicDance/model_lib/ControlNet/models/cldm_v15_reference_only_pose.yaml', reinit_hint_block=False, image_size=64, empty_text_prob=0.1, sd_locked=True, only_mid_control=False, finetune_all=False, finetune_imagecond_unet=False, control_type=['body+hand+face'], control_dropout=0.0, depth_bg_threshold=0.0, inpaint_unet=False, blank_mask_prob=0.0, mask_densepose=0.0, control_mode='controlnet_important', wonoise=True, mask_bg=False, img_bin_limit='all', num_workers=1, train_batch_size=1, val_batch_size=1, lr=1e-05, lr_sd=1e-05, weight_decay=0, lr_anneal_steps=0, ema_rate=0, num_train_steps=1, grad_clip_norm=0.5, gradient_accumulation_steps=1, seed=4, logging_steps=100, logging_gen_steps=1000, save_steps=10000, save_total_limit=100, use_fp16=True, global_step=0, load_optimizer_state=True, compile=False, with_text=False, pose_transfer=False, eta=0.0, autoreg=False, gif_time=0.03, text_prompt=None, v4=True, train_dataset='tiktok_video_arnold', output_dir=None, local_log_dir='/content/MagicDance/tiktok_test_log/tb_log/0125/001/log', local_image_dir='/content/MagicDance/tiktok_test_log/image_log/0125/001/image', resume_dir=None, image_pretrain_dir='/content/MagicDance/pretrained_weights/model_state-110000.th', pose_pretrain_dir=None, init_path='/content/MagicDance/pretrained_weights/control_sd15_ini.ckpt', local_cond_image_path='/tmp/tmpolwkshiqCMQVuHLVEAEEHxw.jpg', local_pose_path='/content/MagicDance/example_data/pose_sequence/001', world_size=1, local_rank=0, rank=0, device=device(type='cuda', index=0))
ControlLDMReferenceOnlyPose: Running in eps-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
()
{'linear_start': 0.00085, 'linear_end': 0.012, 'num_timesteps_cond': 1, 'log_every_t': 200, 'timesteps': 1000, 'first_stage_key': 'jpg', 'cond_stage_key': 'txt', 'image_size': 64, 'channels': 4, 'cond_stage_trainable': False, 'conditioning_key': 'crossattn', 'monitor': 'val/loss_simple_ema', 'scale_factor': 0.18215, 'use_ema': False, 'unet_config': {'target': 'model_lib.ControlNet.cldm.cldm.ControlledUnetModelAttnPose', 'params': {'image_size': 32, 'in_channels': 4, 'out_channels': 4, 'model_channels': 320, 'attention_resolutions': [4, 2, 1], 'num_res_blocks': 2, 'channel_mult': [1, 2, 4, 4], 'num_heads': 8, 'use_spatial_transformer': True, 'transformer_depth': 1, 'context_dim': 768, 'use_checkpoint': True, 'legacy': False}}, 'first_stage_config': {'target': 'model_lib.ControlNet.ldm.models.autoencoder.AutoencoderKL', 'params': {'embed_dim': 4, 'monitor': 'val/rec_loss', 'ddconfig': {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0}, 'lossconfig': {'target': 'torch.nn.Identity'}}}, 'cond_stage_config': {'target': 'model_lib.ControlNet.ldm.modules.encoders.modules.FrozenCLIPEmbedder'}}
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 8 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 768 and using 8 heads.
Loaded model config from [/content/MagicDance/model_lib/ControlNet/models/cldm_v15_reference_only_pose.yaml]
Total base  parameters 2288.11M
find model state dict from /content/MagicDance/pretrained_weights/model_state-110000.th ...
Loading model state dict from /content/MagicDance/pretrained_weights/model_state-110000.th ...
Max memory allocated after creating DDP: 17550.0MB
[rank0] start training loop!
train steps: 1
args: Namespace(model_config='/content/MagicDance/model_lib/ControlNet/models/cldm_v15_reference_only_pose.yaml', reinit_hint_block=False, image_size=64, empty_text_prob=0.1, sd_locked=True, only_mid_control=False, finetune_all=False, finetune_imagecond_unet=False, control_type=['body+hand+face'], control_dropout=0.0, depth_bg_threshold=0.0, inpaint_unet=False, blank_mask_prob=0.0, mask_densepose=0.0, control_mode='controlnet_important', wonoise=True, mask_bg=False, img_bin_limit='all', num_workers=1, train_batch_size=1, val_batch_size=1, lr=1e-05, lr_sd=1e-05, weight_decay=0, lr_anneal_steps=0, ema_rate=0, num_train_steps=1, grad_clip_norm=0.5, gradient_accumulation_steps=1, seed=4, logging_steps=100, logging_gen_steps=1000, save_steps=10000, save_total_limit=100, use_fp16=True, global_step=0, load_optimizer_state=True, compile=False, with_text=False, pose_transfer=False, eta=0.0, autoreg=False, gif_time=0.03, text_prompt=None, v4=True, train_dataset='tiktok_video_arnold', output_dir=None, local_log_dir='/content/MagicDance/tiktok_test_log/tb_log/0125/001/log', local_image_dir='/content/MagicDance/tiktok_test_log/image_log/0125/001/image', resume_dir=None, image_pretrain_dir='/content/MagicDance/pretrained_weights/model_state-110000.th', pose_pretrain_dir=None, init_path='/content/MagicDance/pretrained_weights/control_sd15_ini.ckpt', local_cond_image_path='/tmp/tmpolwkshiqCMQVuHLVEAEEHxw.jpg', local_pose_path='/content/MagicDance/example_data/pose_sequence/001', world_size=1, local_rank=0, rank=0, device=device(type='cuda', index=0))
c_cat_list len:  25
Generate Image 0 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 1 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 2 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 3 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 4 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 5 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 6 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 7 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 8 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 9 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 10 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 11 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 12 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 13 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 14 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 15 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 16 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 17 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 18 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 19 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 20 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 21 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 22 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 23 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Generate Image 24 in 25 images
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
Version Details
Version ID
a59027b2c4908585c82dce3a79763bb90dfa8512d758d97ca3818879768d745e
Version Created
February 16, 2024
Run on Replicate →