fofr/tooncrafter ✓🔢📝🖼️ → 🖼️
About
Create videos from illustrated input images

Example Output
Output
Performance Metrics
61.90s
Prediction Time
138.48s
Total Time
All Input Parameters
{ "loop": false, "prompt": "", "image_1": "https://replicate.delivery/pbxt/L1pQdyf4fPVRzU5WxhhHAdH2Eo05X3zhirvNzwAKJ80lA7Qh/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-0.webp", "image_2": "https://replicate.delivery/pbxt/L1pQeBF582rKH3FFAYJCxdFUurBZ1axNFVwKxEd1wIALydhh/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-1.webp", "image_3": "https://replicate.delivery/pbxt/L1pQdTPwSZxnfDkPkM3eArBmHWd5xttTnSkKBhszXJ88pIff/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-3.webp", "max_width": 512, "max_height": 512, "interpolate": false, "negative_prompt": "", "color_correction": true }
Input Parameters
- loop
- Loop the video
- seed
- Set a seed for reproducibility. Random by default.
- prompt
- image_1 (required)
- First input image
- image_2 (required)
- Second input image
- image_3
- Third input image (optional)
- image_4
- Fourth input image (optional)
- image_5
- Fifth input image (optional)
- image_6
- Sixth input image (optional)
- image_7
- Seventh input image (optional)
- image_8
- Eighth input image (optional)
- image_9
- Ninth input image (optional)
- image_10
- Tenth input image (optional)
- max_width
- Maximum width of the video
- max_height
- Maximum height of the video
- interpolate
- Enable 2x interpolation using FILM
- negative_prompt
- Things you do not want to see in your video
- color_correction
- If the colors are coming out strange, or if the colors between your input images are very different, disable this
Output Schema
Output
Example Execution Logs
Random seed set to: 1500914532 Checking inputs ✅ /tmp/inputs/input_1.png ✅ /tmp/inputs/input_2.png ✅ /tmp/inputs/input_3.png ==================================== Checking weights ✅ tooncrafter_512_interp-fp16.safetensors ✅ stable-diffusion-2-1-clip-fp16.safetensors ✅ CLIP-ViT-H-fp16.safetensors ==================================== Running workflow got prompt Executing node 1, title: Load Image, class type: LoadImage Downloading model to: /src/ComfyUI/models/checkpoints/dynamicrafter/tooncrafter_512_interp-fp16.safetensors Executing node 52, title: DownloadAndLoadDynamiCrafterModel, class type: DownloadAndLoadDynamiCrafterModel Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]/root/.pyenv/versions/3.10.6/lib/python3.10/site-packages/huggingface_hub/file_download.py:1194: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder. warnings.warn( Fetching 1 files: 100%|██████████| 1/1 [00:13<00:00, 13.65s/it] Fetching 1 files: 100%|██████████| 1/1 [00:13<00:00, 13.65s/it] Loading model from: /src/ComfyUI/models/checkpoints/dynamicrafter/tooncrafter_512_interp-fp16.safetensors LatentVisualDiffusion: Running in v-prediction mode AE working on z of shape (1, 4, 32, 32) = 4096 dimensions. Working with z of shape (1, 4, 32, 32) = 4096 dimensions. vanilla making attention of type 'vanilla' with 512 in_channels memory-efficient-cross-attn-fusion making attention of type 'memory-efficient-cross-attn-fusion' with 512 in_channels memory-efficient-cross-attn-fusion making attention of type 'memory-efficient-cross-attn-fusion' with 512 in_channels >>> model checkpoint loaded. Model using dtype: torch.float16 Executing node 61, title: DownloadAndLoadCLIPVisionModel, class type: DownloadAndLoadCLIPVisionModel Loading model from: /src/ComfyUI/models/clip_vision/CLIP-ViT-H-fp16.safetensors Executing node 59, title: DownloadAndLoadCLIPModel, class type: DownloadAndLoadCLIPModel clip missing: ['text_model.encoder.layers.23.layer_norm1.weight', 'text_model.encoder.layers.23.layer_norm1.bias', 'text_model.encoder.layers.23.self_attn.q_proj.weight', 'text_model.encoder.layers.23.self_attn.q_proj.bias', 'text_model.encoder.layers.23.self_attn.k_proj.weight', 'text_model.encoder.layers.23.self_attn.k_proj.bias', 'text_model.encoder.layers.23.self_attn.v_proj.weight', 'text_model.encoder.layers.23.self_attn.v_proj.bias', 'text_model.encoder.layers.23.self_attn.out_proj.weight', 'text_model.encoder.layers.23.self_attn.out_proj.bias', 'text_model.encoder.layers.23.layer_norm2.weight', 'text_model.encoder.layers.23.layer_norm2.bias', 'text_model.encoder.layers.23.mlp.fc1.weight', 'text_model.encoder.layers.23.mlp.fc1.bias', 'text_model.encoder.layers.23.mlp.fc2.weight', 'text_model.encoder.layers.23.mlp.fc2.bias', 'text_projection.weight'] Loading model from: /src/ComfyUI/models/clip/stable-diffusion-2-1-clip-fp16.safetensors Requested to load SD2ClipModel Loading 1 new model Executing node 49, title: CLIP Text Encode (Prompt), class type: CLIPTextEncode Executing node 50, title: CLIP Text Encode (Prompt), class type: CLIPTextEncode Executing node 70, title: 🔧 Image Resize, class type: ImageResize+ Executing node 2, title: Load Image, class type: LoadImage Executing node 303, title: Load Image, class type: LoadImage Executing node 28, title: Image Batch Multi, class type: ImageBatchMulti Executing node 6, title: Get Image Size & Count, class type: GetImageSizeAndCount Executing node 65, title: 🔧 Image Resize, class type: ImageResize+ Executing node 57, title: ToonCrafterInterpolation, class type: ToonCrafterInterpolation VAE using dtype: torch.bfloat16 Requested to load CLIPVisionModelProjection Loading 1 new model DDIM Sampler: 0%| | 0/20 [00:00<?, ?it/s] DDIM Sampler: 5%|▌ | 1/20 [00:00<00:13, 1.37it/s] DDIM Sampler: 10%|█ | 2/20 [00:01<00:11, 1.51it/s] DDIM Sampler: 15%|█▌ | 3/20 [00:01<00:10, 1.56it/s] DDIM Sampler: 20%|██ | 4/20 [00:02<00:10, 1.59it/s] DDIM Sampler: 25%|██▌ | 5/20 [00:03<00:09, 1.60it/s] DDIM Sampler: 30%|███ | 6/20 [00:03<00:08, 1.61it/s] DDIM Sampler: 35%|███▌ | 7/20 [00:04<00:08, 1.62it/s] DDIM Sampler: 40%|████ | 8/20 [00:05<00:07, 1.62it/s] DDIM Sampler: 45%|████▌ | 9/20 [00:05<00:06, 1.62it/s] DDIM Sampler: 50%|█████ | 10/20 [00:06<00:06, 1.62it/s] DDIM Sampler: 55%|█████▌ | 11/20 [00:06<00:05, 1.63it/s] DDIM Sampler: 60%|██████ | 12/20 [00:07<00:04, 1.63it/s] DDIM Sampler: 65%|██████▌ | 13/20 [00:08<00:04, 1.63it/s] DDIM Sampler: 70%|███████ | 14/20 [00:08<00:03, 1.63it/s] DDIM Sampler: 75%|███████▌ | 15/20 [00:09<00:03, 1.63it/s] DDIM Sampler: 80%|████████ | 16/20 [00:09<00:02, 1.63it/s] DDIM Sampler: 85%|████████▌ | 17/20 [00:10<00:01, 1.63it/s] DDIM Sampler: 90%|█████████ | 18/20 [00:11<00:01, 1.63it/s] DDIM Sampler: 95%|█████████▌| 19/20 [00:11<00:00, 1.63it/s] DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00, 1.63it/s] DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00, 1.61it/s] DDIM Sampler: 0%| | 0/20 [00:00<?, ?it/s] DDIM Sampler: 5%|▌ | 1/20 [00:00<00:11, 1.63it/s] DDIM Sampler: 10%|█ | 2/20 [00:01<00:11, 1.63it/s] DDIM Sampler: 15%|█▌ | 3/20 [00:01<00:10, 1.63it/s] DDIM Sampler: 20%|██ | 4/20 [00:02<00:09, 1.63it/s] DDIM Sampler: 25%|██▌ | 5/20 [00:03<00:09, 1.63it/s] DDIM Sampler: 30%|███ | 6/20 [00:03<00:08, 1.63it/s] DDIM Sampler: 35%|███▌ | 7/20 [00:04<00:07, 1.63it/s] DDIM Sampler: 40%|████ | 8/20 [00:04<00:07, 1.63it/s] DDIM Sampler: 45%|████▌ | 9/20 [00:05<00:06, 1.63it/s] DDIM Sampler: 50%|█████ | 10/20 [00:06<00:06, 1.63it/s] DDIM Sampler: 55%|█████▌ | 11/20 [00:06<00:05, 1.63it/s] DDIM Sampler: 60%|██████ | 12/20 [00:07<00:04, 1.63it/s] DDIM Sampler: 65%|██████▌ | 13/20 [00:07<00:04, 1.63it/s] DDIM Sampler: 70%|███████ | 14/20 [00:08<00:03, 1.63it/s] DDIM Sampler: 75%|███████▌ | 15/20 [00:09<00:03, 1.63it/s] DDIM Sampler: 80%|████████ | 16/20 [00:09<00:02, 1.63it/s] DDIM Sampler: 85%|████████▌ | 17/20 [00:10<00:01, 1.63it/s] DDIM Sampler: 90%|█████████ | 18/20 [00:11<00:01, 1.63it/s] DDIM Sampler: 95%|█████████▌| 19/20 [00:11<00:00, 1.63it/s] DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00, 1.63it/s] DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00, 1.63it/s] Executing node 58, title: ToonCrafterDecode, class type: ToonCrafterDecode VAE using dtype: torch.bfloat16 Using xformers /root/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/nn/modules/conv.py:605: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.) return F.conv3d( Using xformers Executing node 67, title: Color Match, class type: ColorMatch Executing node 29, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine Prompt executed in 59.29 seconds outputs: {'6': {'text': ['3x512x512']}, '29': {'gifs': [{'filename': 'ToonCrafter_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4'}]}} ==================================== ToonCrafter_00001.png ToonCrafter_00001.mp4
Version Details
- Version ID
0486ff07368e816ec3d5c69b9581e7a09b55817f567a0d74caad9395c9295c77
- Version Created
- July 3, 2024