fofr/tooncrafter ✓🔢📝🖼️ → 🖼️

▶️ 60.6K runs 📅 Jun 2024 ⚙️ Cog 0.9.7 🔗 GitHub 📄 Paper ⚖️ License

animation cartoon image-to-video video-frame-interpolation

About

Create videos from illustrated input images

Example Output

Output

Performance Metrics

61.90s Prediction Time

138.48s Total Time

All Input Parameters

{
  "loop": false,
  "prompt": "",
  "image_1": "https://replicate.delivery/pbxt/L1pQdyf4fPVRzU5WxhhHAdH2Eo05X3zhirvNzwAKJ80lA7Qh/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-0.webp",
  "image_2": "https://replicate.delivery/pbxt/L1pQeBF582rKH3FFAYJCxdFUurBZ1axNFVwKxEd1wIALydhh/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-1.webp",
  "image_3": "https://replicate.delivery/pbxt/L1pQdTPwSZxnfDkPkM3eArBmHWd5xttTnSkKBhszXJ88pIff/replicate-prediction-5cvynz9d91rgg0cfsvqschdpww-3.webp",
  "max_width": 512,
  "max_height": 512,
  "interpolate": false,
  "negative_prompt": "",
  "color_correction": true
}

Input Parameters

loop Type: booleanDefault: false: Loop the video
seed Type: integer: Set a seed for reproducibility. Random by default.
prompt Type: stringDefault:
image_1 (required) Type: string: First input image
image_2 (required) Type: string: Second input image
image_3 Type: string: Third input image (optional)
image_4 Type: string: Fourth input image (optional)
image_5 Type: string: Fifth input image (optional)
image_6 Type: string: Sixth input image (optional)
image_7 Type: string: Seventh input image (optional)
image_8 Type: string: Eighth input image (optional)
image_9 Type: string: Ninth input image (optional)
image_10 Type: string: Tenth input image (optional)
max_width Type: integerDefault: 512Range: 256 - 768: Maximum width of the video
max_height Type: integerDefault: 512Range: 256 - 768: Maximum height of the video
interpolate Type: booleanDefault: false: Enable 2x interpolation using FILM
negative_prompt Type: stringDefault:: Things you do not want to see in your video
color_correction Type: booleanDefault: true: If the colors are coming out strange, or if the colors between your input images are very different, disable this

Output Schema

Output

Type: array • Items Type: string • Items Format: uri

Example Execution Logs

Random seed set to: 1500914532
Checking inputs
✅ /tmp/inputs/input_1.png
✅ /tmp/inputs/input_2.png
✅ /tmp/inputs/input_3.png
====================================
Checking weights
✅ tooncrafter_512_interp-fp16.safetensors
✅ stable-diffusion-2-1-clip-fp16.safetensors
✅ CLIP-ViT-H-fp16.safetensors
====================================
Running workflow
got prompt
Executing node 1, title: Load Image, class type: LoadImage
Downloading model to: /src/ComfyUI/models/checkpoints/dynamicrafter/tooncrafter_512_interp-fp16.safetensors
Executing node 52, title: DownloadAndLoadDynamiCrafterModel, class type: DownloadAndLoadDynamiCrafterModel
Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]/root/.pyenv/versions/3.10.6/lib/python3.10/site-packages/huggingface_hub/file_download.py:1194: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`.
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
warnings.warn(
Fetching 1 files: 100%|██████████| 1/1 [00:13<00:00, 13.65s/it]
Fetching 1 files: 100%|██████████| 1/1 [00:13<00:00, 13.65s/it]
Loading model from: /src/ComfyUI/models/checkpoints/dynamicrafter/tooncrafter_512_interp-fp16.safetensors
LatentVisualDiffusion: Running in v-prediction mode
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
vanilla
making attention of type 'vanilla' with 512 in_channels
memory-efficient-cross-attn-fusion
making attention of type 'memory-efficient-cross-attn-fusion' with 512 in_channels
memory-efficient-cross-attn-fusion
making attention of type 'memory-efficient-cross-attn-fusion' with 512 in_channels
>>> model checkpoint loaded.
Model using dtype: torch.float16
Executing node 61, title: DownloadAndLoadCLIPVisionModel, class type: DownloadAndLoadCLIPVisionModel
Loading model from: /src/ComfyUI/models/clip_vision/CLIP-ViT-H-fp16.safetensors
Executing node 59, title: DownloadAndLoadCLIPModel, class type: DownloadAndLoadCLIPModel
clip missing: ['text_model.encoder.layers.23.layer_norm1.weight', 'text_model.encoder.layers.23.layer_norm1.bias', 'text_model.encoder.layers.23.self_attn.q_proj.weight', 'text_model.encoder.layers.23.self_attn.q_proj.bias', 'text_model.encoder.layers.23.self_attn.k_proj.weight', 'text_model.encoder.layers.23.self_attn.k_proj.bias', 'text_model.encoder.layers.23.self_attn.v_proj.weight', 'text_model.encoder.layers.23.self_attn.v_proj.bias', 'text_model.encoder.layers.23.self_attn.out_proj.weight', 'text_model.encoder.layers.23.self_attn.out_proj.bias', 'text_model.encoder.layers.23.layer_norm2.weight', 'text_model.encoder.layers.23.layer_norm2.bias', 'text_model.encoder.layers.23.mlp.fc1.weight', 'text_model.encoder.layers.23.mlp.fc1.bias', 'text_model.encoder.layers.23.mlp.fc2.weight', 'text_model.encoder.layers.23.mlp.fc2.bias', 'text_projection.weight']
Loading model from: /src/ComfyUI/models/clip/stable-diffusion-2-1-clip-fp16.safetensors
Requested to load SD2ClipModel
Loading 1 new model
Executing node 49, title: CLIP Text Encode (Prompt), class type: CLIPTextEncode
Executing node 50, title: CLIP Text Encode (Prompt), class type: CLIPTextEncode
Executing node 70, title: 🔧 Image Resize, class type: ImageResize+
Executing node 2, title: Load Image, class type: LoadImage
Executing node 303, title: Load Image, class type: LoadImage
Executing node 28, title: Image Batch Multi, class type: ImageBatchMulti
Executing node 6, title: Get Image Size & Count, class type: GetImageSizeAndCount
Executing node 65, title: 🔧 Image Resize, class type: ImageResize+
Executing node 57, title: ToonCrafterInterpolation, class type: ToonCrafterInterpolation
VAE using dtype: torch.bfloat16
Requested to load CLIPVisionModelProjection
Loading 1 new model
DDIM Sampler:   0%|          | 0/20 [00:00<?, ?it/s]
DDIM Sampler:   5%|▌         | 1/20 [00:00<00:13,  1.37it/s]
DDIM Sampler:  10%|█         | 2/20 [00:01<00:11,  1.51it/s]
DDIM Sampler:  15%|█▌        | 3/20 [00:01<00:10,  1.56it/s]
DDIM Sampler:  20%|██        | 4/20 [00:02<00:10,  1.59it/s]
DDIM Sampler:  25%|██▌       | 5/20 [00:03<00:09,  1.60it/s]
DDIM Sampler:  30%|███       | 6/20 [00:03<00:08,  1.61it/s]
DDIM Sampler:  35%|███▌      | 7/20 [00:04<00:08,  1.62it/s]
DDIM Sampler:  40%|████      | 8/20 [00:05<00:07,  1.62it/s]
DDIM Sampler:  45%|████▌     | 9/20 [00:05<00:06,  1.62it/s]
DDIM Sampler:  50%|█████     | 10/20 [00:06<00:06,  1.62it/s]
DDIM Sampler:  55%|█████▌    | 11/20 [00:06<00:05,  1.63it/s]
DDIM Sampler:  60%|██████    | 12/20 [00:07<00:04,  1.63it/s]
DDIM Sampler:  65%|██████▌   | 13/20 [00:08<00:04,  1.63it/s]
DDIM Sampler:  70%|███████   | 14/20 [00:08<00:03,  1.63it/s]
DDIM Sampler:  75%|███████▌  | 15/20 [00:09<00:03,  1.63it/s]
DDIM Sampler:  80%|████████  | 16/20 [00:09<00:02,  1.63it/s]
DDIM Sampler:  85%|████████▌ | 17/20 [00:10<00:01,  1.63it/s]
DDIM Sampler:  90%|█████████ | 18/20 [00:11<00:01,  1.63it/s]
DDIM Sampler:  95%|█████████▌| 19/20 [00:11<00:00,  1.63it/s]
DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00,  1.63it/s]
DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00,  1.61it/s]
DDIM Sampler:   0%|          | 0/20 [00:00<?, ?it/s]
DDIM Sampler:   5%|▌         | 1/20 [00:00<00:11,  1.63it/s]
DDIM Sampler:  10%|█         | 2/20 [00:01<00:11,  1.63it/s]
DDIM Sampler:  15%|█▌        | 3/20 [00:01<00:10,  1.63it/s]
DDIM Sampler:  20%|██        | 4/20 [00:02<00:09,  1.63it/s]
DDIM Sampler:  25%|██▌       | 5/20 [00:03<00:09,  1.63it/s]
DDIM Sampler:  30%|███       | 6/20 [00:03<00:08,  1.63it/s]
DDIM Sampler:  35%|███▌      | 7/20 [00:04<00:07,  1.63it/s]
DDIM Sampler:  40%|████      | 8/20 [00:04<00:07,  1.63it/s]
DDIM Sampler:  45%|████▌     | 9/20 [00:05<00:06,  1.63it/s]
DDIM Sampler:  50%|█████     | 10/20 [00:06<00:06,  1.63it/s]
DDIM Sampler:  55%|█████▌    | 11/20 [00:06<00:05,  1.63it/s]
DDIM Sampler:  60%|██████    | 12/20 [00:07<00:04,  1.63it/s]
DDIM Sampler:  65%|██████▌   | 13/20 [00:07<00:04,  1.63it/s]
DDIM Sampler:  70%|███████   | 14/20 [00:08<00:03,  1.63it/s]
DDIM Sampler:  75%|███████▌  | 15/20 [00:09<00:03,  1.63it/s]
DDIM Sampler:  80%|████████  | 16/20 [00:09<00:02,  1.63it/s]
DDIM Sampler:  85%|████████▌ | 17/20 [00:10<00:01,  1.63it/s]
DDIM Sampler:  90%|█████████ | 18/20 [00:11<00:01,  1.63it/s]
DDIM Sampler:  95%|█████████▌| 19/20 [00:11<00:00,  1.63it/s]
DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00,  1.63it/s]
DDIM Sampler: 100%|██████████| 20/20 [00:12<00:00,  1.63it/s]
Executing node 58, title: ToonCrafterDecode, class type: ToonCrafterDecode
VAE using dtype: torch.bfloat16
Using xformers
/root/.pyenv/versions/3.10.6/lib/python3.10/site-packages/torch/nn/modules/conv.py:605: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv3d(
Using xformers
Executing node 67, title: Color Match, class type: ColorMatch
Executing node 29, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine
Prompt executed in 59.29 seconds
outputs:  {'6': {'text': ['3x512x512']}, '29': {'gifs': [{'filename': 'ToonCrafter_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4'}]}}
====================================
ToonCrafter_00001.png
ToonCrafter_00001.mp4

Version Details

Version ID: 0486ff07368e816ec3d5c69b9581e7a09b55817f567a0d74caad9395c9295c77
Version Created: July 3, 2024

Run on Replicate →