fofr/tooncrafter ✓🔢📝🖼️ → 🖼️
About
Create videos from illustrated input images
Example Output
Output
Performance Metrics
61.90s
Prediction Time
138.48s
Total Time
All Input Parameters
{
"loop": false,
"prompt": "",
"image_1": "https://replicate.delivery/pbxt/L1pQdyf4fPVRzU5WxhhHAdH2Eo05X3zhirvNzwAKJ80lA7Qh/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-0.webp",
"image_2": "https://replicate.delivery/pbxt/L1pQeBF582rKH3FFAYJCxdFUurBZ1axNFVwKxEd1wIALydhh/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-1.webp",
"image_3": "https://replicate.delivery/pbxt/L1pQdTPwSZxnfDkPkM3eArBmHWd5xttTnSkKBhszXJ88pIff/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-3.webp",
"max_width": 512,
"max_height": 512,
"interpolate": false,
"negative_prompt": "",
"color_correction": true
}
Input Parameters
- loop
- Loop the video
- seed
- Set a seed for reproducibility. Random by default.
- prompt
- image_1 (required)
- First input image
- image_2 (required)
- Second input image
- image_3
- Third input image (optional)
- image_4
- Fourth input image (optional)
- image_5
- Fifth input image (optional)
- image_6
- Sixth input image (optional)
- image_7
- Seventh input image (optional)
- image_8
- Eighth input image (optional)
- image_9
- Ninth input image (optional)
- image_10
- Tenth input image (optional)
- max_width
- Maximum width of the video
- max_height
- Maximum height of the video
- interpolate
- Enable 2x interpolation using FILM
- negative_prompt
- Things you do not want to see in your video
- color_correction
- If the colors are coming out strange, or if the colors between your input images are very different, disable this
Output Schema
Output
Example Execution Logs
Random seed set to: 1500914532
Checking inputs
✅ /tmp/inputs/input_1.png
✅ /tmp/inputs/input_2.png
✅ /tmp/inputs/input_3.png
====================================
Checking weights
✅ tooncrafter_512_interp-fp16.safetensors
✅ stable-diffusion-2-1-clip-fp16.safetensors
✅ CLIP-ViT-H-fp16.safetensors
====================================
Running workflow
got prompt
Executing node 1, title: Load Image, class type: LoadImage
Downloading model to: /src/ComfyUI/models/checkpoints/dynamicrafter/tooncrafter_512_interp-fp16.safetensors
Executing node 52, title: DownloadAndLoadDynamiCrafterModel, class type: DownloadAndLoadDynamiCrafterModel
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]/root/.pyenv/versions/3.10.6/lib/python3.10/site-packages/huggingface_hub/file_download.py:1194: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`.
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
warnings.warn(
Fetching 1 files: 100%|██████████| 1/1 [00:13<00:00, 13.65s/it]
Fetching 1 files: 100%|██████████| 1/1 [00:13<00:00, 13.65s/it]
Loading model from: /src/ComfyUI/models/checkpoints/dynamicrafter/tooncrafter_512_interp-fp16.safetensors
LatentVisualDiffusion: Running in v-prediction mode
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
vanilla
making attention of type 'vanilla' with 512 in_channels
memory-efficient-cross-attn-fusion
making attention of type 'memory-efficient-cross-attn-fusion' with 512 in_channels
memory-efficient-cross-attn-fusion
making attention of type 'memory-efficient-cross-attn-fusion' with 512 in_channels
>>> model checkpoint loaded.
Model using dtype: torch.float16
Executing node 61, title: DownloadAndLoadCLIPVisionModel, class type: DownloadAndLoadCLIPVisionModel
Loading model from: /src/ComfyUI/models/clip_vision/CLIP-ViT-H-fp16.safetensors
Executing node 59, title: DownloadAndLoadCLIPModel, class type: DownloadAndLoadCLIPModel
clip missing: ['text_model.encoder.layers.23.layer_norm1.weight', 'text_model.encoder.layers.23.layer_norm1.bias', 'text_model.encoder.layers.23.self_attn.q_proj.weight', 'text_model.encoder.layers.23.self_attn.q_proj.bias', 'text_model.encoder.layers.23.self_attn.k_proj.weight', 'text_model.encoder.layers.23.self_attn.k_proj.bias', 'text_model.encoder.layers.23.self_attn.v_proj.weight', 'text_model.encoder.layers.23.self_attn.v_proj.bias', 'text_model.encoder.layers.23.self_attn.out_proj.weight', 'text_model.encoder.layers.23.self_attn.out_proj.bias', 'text_model.encoder.layers.23.layer_norm2.weight', 'text_model.encoder.layers.23.layer_norm2.bias', 'text_model.encoder.layers.23.mlp.fc1.weight', 'text_model.encoder.layers.23.mlp.fc1.bias', 'text_model.encoder.layers.23.mlp.fc2.weight', 'text_model.encoder.layers.23.mlp.fc2.bias', 'text_projection.weight']
Loading model from: /src/ComfyUI/models/clip/stable-diffusion-2-1-clip-fp16.safetensors
Requested to load SD2ClipModel
Loading 1 new model
Executing node 49, title: CLIP Text Encode (Prompt), class type: CLIPTextEncode
Executing node 50, title: CLIP Text Encode (Prompt), class type: CLIPTextEncode
Executing node 70, title: 🔧 Image Resize, class type: ImageResize+
Executing node 2, title: Load Image, class type: LoadImage
Executing node 303, title: Load Image, class type: LoadImage
Executing node 28, title: Image Batch Multi, class type: ImageBatchMulti
Executing node 6, title: Get Image Size & Count, class type: GetImageSizeAndCount
Executing node 65, title: 🔧 Image Resize, class type: ImageResize+
Executing node 57, title: ToonCrafterInterpolation, class type: ToonCrafterInterpolation
VAE using dtype: torch.bfloat16
Requested to load CLIPVisionModelProjection
Loading 1 new model
DDIM Sampler: 0%| | 0/20 [00:00<?, ?it/s]
DDIM Sampler: 5%|▌ | 1/20 [00:00<00:13, 1.37it/s]
DDIM Sampler: 10%|█ | 2/20 [00:01<00:11, 1.51it/s]
DDIM Sampler: 15%|█▌ | 3/20 [00:01<00:10, 1.56it/s]
DDIM Sampler: 20%|██ | 4/20 [00:02<00:10, 1.59it/s]
DDIM Sampler: 25%|██▌ | 5/20 [00:03<00:09, 1.60it/s]
DDIM Sampler: 30%|███ | 6/20 [00:03<00:08, 1.61it/s]
DDIM Sampler: 35%|███▌ | 7/20 [00:04<00:08, 1.62it/s]
DDIM Sampler: 40%|████ | 8/20 [00:05<00:07, 1.62it/s]
DDIM Sampler: 45%|████▌ | 9/20 [00:05<00:06, 1.62it/s]
DDIM Sampler: 50%|█████ | 10/20 [00:06<00:06, 1.62it/s]
DDIM Sampler: 55%|█████▌ | 11/20 [00:06<00:05, 1.63it/s]
DDIM Sampler: 60%|██████ | 12/20 [00:07<00:04, 1.63it/s]
DDIM Sampler: 65%|██████▌ | 13/20 [00:08<00:04, 1.63it/s]
DDIM Sampler: 70%|███████ | 14/20 [00:08<00:03, 1.63it/s]
DDIM Sampler: 75%|███████▌ | 15/20 [00:09<00:03, 1.63it/s]
DDIM Sampler: 80%|████████ | 16/20 [00:09<00:02, 1.63it/s]
DDIM Sampler: 85%|████████▌ | 17/20 [00:10<00:01, 1.63it/s]
DDIM Sampler: 90%|█████████ | 18/20 [00:11<00:01, 1.63it/s]
DDIM Sampler: 95%|█████████▌| 19/20 [00:11<00:00, 1.63it/s]
DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00, 1.63it/s]
DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00, 1.61it/s]
DDIM Sampler: 0%| | 0/20 [00:00<?, ?it/s]
DDIM Sampler: 5%|▌ | 1/20 [00:00<00:11, 1.63it/s]
DDIM Sampler: 10%|█ | 2/20 [00:01<00:11, 1.63it/s]
DDIM Sampler: 15%|█▌ | 3/20 [00:01<00:10, 1.63it/s]
DDIM Sampler: 20%|██ | 4/20 [00:02<00:09, 1.63it/s]
DDIM Sampler: 25%|██▌ | 5/20 [00:03<00:09, 1.63it/s]
DDIM Sampler: 30%|███ | 6/20 [00:03<00:08, 1.63it/s]
DDIM Sampler: 35%|███▌ | 7/20 [00:04<00:07, 1.63it/s]
DDIM Sampler: 40%|████ | 8/20 [00:04<00:07, 1.63it/s]
DDIM Sampler: 45%|████▌ | 9/20 [00:05<00:06, 1.63it/s]
DDIM Sampler: 50%|█████ | 10/20 [00:06<00:06, 1.63it/s]
DDIM Sampler: 55%|█████▌ | 11/20 [00:06<00:05, 1.63it/s]
DDIM Sampler: 60%|██████ | 12/20 [00:07<00:04, 1.63it/s]
DDIM Sampler: 65%|██████▌ | 13/20 [00:07<00:04, 1.63it/s]
DDIM Sampler: 70%|███████ | 14/20 [00:08<00:03, 1.63it/s]
DDIM Sampler: 75%|███████▌ | 15/20 [00:09<00:03, 1.63it/s]
DDIM Sampler: 80%|████████ | 16/20 [00:09<00:02, 1.63it/s]
DDIM Sampler: 85%|████████▌ | 17/20 [00:10<00:01, 1.63it/s]
DDIM Sampler: 90%|█████████ | 18/20 [00:11<00:01, 1.63it/s]
DDIM Sampler: 95%|█████████▌| 19/20 [00:11<00:00, 1.63it/s]
DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00, 1.63it/s]
DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00, 1.63it/s]
Executing node 58, title: ToonCrafterDecode, class type: ToonCrafterDecode
VAE using dtype: torch.bfloat16
Using xformers
/root/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/nn/modules/conv.py:605: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv3d(
Using xformers
Executing node 67, title: Color Match, class type: ColorMatch
Executing node 29, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine
Prompt executed in 59.29 seconds
outputs: {'6': {'text': ['3x512x512']}, '29': {'gifs': [{'filename': 'ToonCrafter_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4'}]}}
====================================
ToonCrafter_00001.png
ToonCrafter_00001.mp4
Version Details
- Version ID
0486ff07368e816ec3d5c69b9581e7a09b55817f567a0d74caad9395c9295c77- Version Created
- July 3, 2024