camenduru/story-diffusion 🔢📝❓🖼️ → 🖼️
About
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Example Output
Output






Performance Metrics
56.06s
Prediction Time
365.86s
Total Time
All Input Parameters
{
"sa32_": 0.5,
"sa64_": 0.5,
"seed_": 1,
"style": "Japanese Anime",
"G_width": 768,
"sd_type": "Unstable",
"G_height": 768,
"num_steps": 50,
"comic_type": "Classic Comic Style",
"id_length_": 3,
"model_type": "Using Ref Images",
"input_image": "https://replicate.delivery/pbxt/KqySXsVmWku71q5LZeNjgasK4oVRILdFPt9dKKCEYG5ZFVko/1%20%281%29.jpeg",
"prompt_array": "wake up in the bed\nhave breakfast\nis on the road, go to company\nwork in the company\nTake a walk next to the company at noon\nlying in bed at night",
"general_prompt": "a woman img, wearing a white T-shirt, blue loose hair",
"guidance_scale": 5,
"negative_prompt": "bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs",
"Ip_Adapter_Strength": 0.5,
"style_strength_ratio": 20
}
Input Parameters
- sa32_
- The degree of Paired Attention at 32 x 32 self-attention layers
- sa64_
- The degree of Paired Attention at 64 x 64 self-attention layers
- seed_
- Seed
- style
- Style template: '(No style)', 'Japanese Anime', 'Cinematic', 'Disney Character', 'Photographic', 'Comic book', 'Line art'
- G_width
- Width
- G_height
- Height
- num_steps
- Number of sample steps
- comic_type
- Typesetting Style
- id_length_
- Number of id images in total images
- model_type
- Control type of the Character
- input_image (required)
- prompt_array
- Comic Description (each line corresponds to a frame).
- general_prompt
- Textual Description for Character
- guidance_scale
- Guidance scale
- negative_prompt
- Negative Prompt
- Ip_Adapter_Strength
- Ip Adapter Strength
- style_strength_ratio
- Style strength of Ref Image (%)
Output Schema
Output
Example Execution Logs
successsfully load paired self-attention number of the processor : 36 /content/StoryDiffusion-hf/examples/taylor/1.jpeg start_merge_step:10 ['a woman img, wearing a white T-shirt, blue loose hair,wake up in the bed', 'a woman img, wearing a white T-shirt, blue loose hair,have breakfast', 'a woman img, wearing a white T-shirt, blue loose hair,is on the road, go to company', 'a woman img, wearing a white T-shirt, blue loose hair,work in the company', 'a woman img, wearing a white T-shirt, blue loose hair,Take a walk next to the company at noon', 'a woman img, wearing a white T-shirt, blue loose hair,lying in bed at night'] anime artwork illustrating a woman, wearing a white t - shirt, blue loose hair, wake up in the bed. created by japanese anime studio. highly emotional. best quality, high resolution anime artwork illustrating a woman, wearing a white t - shirt, blue loose hair, have breakfast. created by japanese anime studio. highly emotional. best quality, high resolution anime artwork illustrating a woman, wearing a white t - shirt, blue loose hair, is on the road, go to company. created by japanese anime studio. highly emotional. best quality, high resolution torch.Size([3, 77, 2048]) torch.Size([3, 77, 2048]) torch.Size([3, 77, 2048]) torch.Size([3, 1280]) torch.Size([3, 1280]) torch.Size([3, 1280]) torch.Size([3, 4, 96, 96]) torch.Size([6, 6]) 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:32, 1.51it/s] 4%|▍ | 2/50 [00:00<00:22, 2.17it/s] 6%|▌ | 3/50 [00:01<00:18, 2.48it/s] 8%|▊ | 4/50 [00:01<00:17, 2.58it/s] 10%|█ | 5/50 [00:02<00:17, 2.65it/s] 12%|█▏ | 6/50 [00:02<00:16, 2.70it/s] 14%|█▍ | 7/50 [00:02<00:15, 2.74it/s] 16%|█▌ | 8/50 [00:03<00:15, 2.73it/s] 18%|█▊ | 9/50 [00:03<00:14, 2.77it/s] 20%|██ | 10/50 [00:03<00:14, 2.79it/s] 22%|██▏ | 11/50 [00:04<00:13, 2.82it/s] 24%|██▍ | 12/50 [00:04<00:13, 2.82it/s] 26%|██▌ | 13/50 [00:04<00:13, 2.81it/s] 28%|██▊ | 14/50 [00:05<00:12, 2.79it/s] 30%|███ | 15/50 [00:05<00:12, 2.82it/s] 32%|███▏ | 16/50 [00:05<00:12, 2.79it/s] 34%|███▍ | 17/50 [00:06<00:11, 2.76it/s] 36%|███▌ | 18/50 [00:06<00:11, 2.73it/s] 38%|███▊ | 19/50 [00:07<00:11, 2.75it/s] 40%|████ | 20/50 [00:07<00:10, 2.76it/s] 42%|████▏ | 21/50 [00:07<00:10, 2.76it/s] 44%|████▍ | 22/50 [00:08<00:10, 2.74it/s] 46%|████▌ | 23/50 [00:08<00:09, 2.72it/s] 48%|████▊ | 24/50 [00:08<00:09, 2.71it/s] 50%|█████ | 25/50 [00:09<00:09, 2.72it/s] 52%|█████▏ | 26/50 [00:09<00:08, 2.72it/s] 54%|█████▍ | 27/50 [00:10<00:08, 2.70it/s] 56%|█████▌ | 28/50 [00:10<00:08, 2.70it/s] 58%|█████▊ | 29/50 [00:10<00:07, 2.69it/s] 60%|██████ | 30/50 [00:11<00:07, 2.68it/s] 62%|██████▏ | 31/50 [00:11<00:07, 2.68it/s] 64%|██████▍ | 32/50 [00:11<00:06, 2.68it/s] 66%|██████▌ | 33/50 [00:12<00:06, 2.68it/s] 68%|██████▊ | 34/50 [00:12<00:05, 2.69it/s] 70%|███████ | 35/50 [00:12<00:05, 2.68it/s] 72%|███████▏ | 36/50 [00:13<00:05, 2.69it/s] 74%|███████▍ | 37/50 [00:13<00:04, 2.69it/s] 76%|███████▌ | 38/50 [00:14<00:04, 2.69it/s] 78%|███████▊ | 39/50 [00:14<00:04, 2.69it/s] 80%|████████ | 40/50 [00:14<00:03, 2.69it/s] 82%|████████▏ | 41/50 [00:15<00:03, 2.70it/s] 84%|████████▍ | 42/50 [00:15<00:02, 2.70it/s] 86%|████████▌ | 43/50 [00:15<00:02, 2.69it/s] 88%|████████▊ | 44/50 [00:16<00:02, 2.70it/s] 90%|█████████ | 45/50 [00:16<00:01, 2.72it/s] 92%|█████████▏| 46/50 [00:17<00:01, 2.71it/s] 94%|█████████▍| 47/50 [00:17<00:01, 2.72it/s] 96%|█████████▌| 48/50 [00:17<00:00, 2.73it/s] 98%|█████████▊| 49/50 [00:18<00:00, 2.76it/s] 100%|██████████| 50/50 [00:18<00:00, 2.74it/s] 100%|██████████| 50/50 [00:18<00:00, 2.70it/s] anime artwork illustrating a woman, wearing a white t - shirt, blue loose hair, work in the company. created by japanese anime studio. highly emotional. best quality, high resolution torch.Size([1, 77, 2048]) torch.Size([1, 77, 2048]) torch.Size([1, 77, 2048]) torch.Size([1, 1280]) torch.Size([1, 1280]) torch.Size([1, 1280]) torch.Size([1, 4, 96, 96]) torch.Size([2, 6]) 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:09, 5.08it/s] 4%|▍ | 2/50 [00:00<00:07, 6.27it/s] 6%|▌ | 3/50 [00:00<00:07, 6.49it/s] 8%|▊ | 4/50 [00:00<00:07, 6.35it/s] 10%|█ | 5/50 [00:00<00:07, 6.28it/s] 12%|█▏ | 6/50 [00:00<00:06, 6.33it/s] 14%|█▍ | 7/50 [00:01<00:06, 6.34it/s] 16%|█▌ | 8/50 [00:01<00:06, 6.28it/s] 18%|█▊ | 9/50 [00:01<00:06, 6.31it/s] 20%|██ | 10/50 [00:01<00:06, 6.37it/s] 22%|██▏ | 11/50 [00:01<00:06, 6.41it/s] 24%|██▍ | 12/50 [00:01<00:05, 6.39it/s] 26%|██▌ | 13/50 [00:02<00:05, 6.35it/s] 28%|██▊ | 14/50 [00:02<00:05, 6.30it/s] 30%|███ | 15/50 [00:02<00:05, 6.38it/s] 32%|███▏ | 16/50 [00:02<00:05, 6.31it/s] 34%|███▍ | 17/50 [00:02<00:05, 6.21it/s] 36%|███▌ | 18/50 [00:02<00:05, 6.14it/s] 38%|███▊ | 19/50 [00:03<00:04, 6.21it/s] 40%|████ | 20/50 [00:03<00:04, 6.25it/s] 42%|████▏ | 21/50 [00:03<00:04, 6.23it/s] 44%|████▍ | 22/50 [00:03<00:04, 6.17it/s] 46%|████▌ | 23/50 [00:03<00:04, 6.13it/s] 48%|████▊ | 24/50 [00:03<00:04, 6.11it/s] 50%|█████ | 25/50 [00:04<00:04, 6.10it/s] 52%|█████▏ | 26/50 [00:04<00:03, 6.10it/s] 54%|█████▍ | 27/50 [00:04<00:03, 6.05it/s] 56%|█████▌ | 28/50 [00:04<00:03, 6.04it/s] 58%|█████▊ | 29/50 [00:04<00:03, 6.01it/s] 60%|██████ | 30/50 [00:04<00:03, 5.97it/s] 62%|██████▏ | 31/50 [00:05<00:03, 5.95it/s] 64%|██████▍ | 32/50 [00:05<00:03, 5.97it/s] 66%|██████▌ | 33/50 [00:05<00:02, 5.99it/s] 68%|██████▊ | 34/50 [00:05<00:02, 5.97it/s] 70%|███████ | 35/50 [00:05<00:02, 5.98it/s] 72%|███████▏ | 36/50 [00:05<00:02, 5.99it/s] 74%|███████▍ | 37/50 [00:06<00:02, 5.97it/s] 76%|███████▌ | 38/50 [00:06<00:02, 5.95it/s] 78%|███████▊ | 39/50 [00:06<00:01, 5.99it/s] 80%|████████ | 40/50 [00:06<00:01, 5.99it/s] 82%|████████▏ | 41/50 [00:06<00:01, 6.03it/s] 84%|████████▍ | 42/50 [00:06<00:01, 6.02it/s] 86%|████████▌ | 43/50 [00:07<00:01, 6.01it/s] 88%|████████▊ | 44/50 [00:07<00:00, 6.05it/s] 90%|█████████ | 45/50 [00:07<00:00, 6.11it/s] 92%|█████████▏| 46/50 [00:07<00:00, 6.08it/s] 94%|█████████▍| 47/50 [00:07<00:00, 6.07it/s] 96%|█████████▌| 48/50 [00:07<00:00, 6.10it/s] 98%|█████████▊| 49/50 [00:07<00:00, 6.18it/s] 100%|██████████| 50/50 [00:08<00:00, 6.15it/s] 100%|██████████| 50/50 [00:08<00:00, 6.13it/s] anime artwork illustrating a woman, wearing a white t - shirt, blue loose hair, take a walk next to the company at noon. created by japanese anime studio. highly emotional. best quality, high resolution torch.Size([1, 77, 2048]) torch.Size([1, 77, 2048]) torch.Size([1, 77, 2048]) torch.Size([1, 1280]) torch.Size([1, 1280]) torch.Size([1, 1280]) torch.Size([1, 4, 96, 96]) torch.Size([2, 6]) 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:06, 7.32it/s] 4%|▍ | 2/50 [00:00<00:06, 7.42it/s] 6%|▌ | 3/50 [00:00<00:06, 7.11it/s] 8%|▊ | 4/50 [00:00<00:06, 6.69it/s] 10%|█ | 5/50 [00:00<00:06, 6.50it/s] 12%|█▏ | 6/50 [00:00<00:06, 6.49it/s] 14%|█▍ | 7/50 [00:01<00:06, 6.45it/s] 16%|█▌ | 8/50 [00:01<00:06, 6.35it/s] 18%|█▊ | 9/50 [00:01<00:06, 6.38it/s] 20%|██ | 10/50 [00:01<00:06, 6.42it/s] 22%|██▏ | 11/50 [00:01<00:06, 6.44it/s] 24%|██▍ | 12/50 [00:01<00:05, 6.42it/s] 26%|██▌ | 13/50 [00:01<00:05, 6.38it/s] 28%|██▊ | 14/50 [00:02<00:05, 6.32it/s] 30%|███ | 15/50 [00:02<00:05, 6.40it/s] 32%|███▏ | 16/50 [00:02<00:05, 6.33it/s] 34%|███▍ | 17/50 [00:02<00:05, 6.24it/s] 36%|███▌ | 18/50 [00:02<00:05, 6.17it/s] 38%|███▊ | 19/50 [00:02<00:04, 6.21it/s] 40%|████ | 20/50 [00:03<00:04, 6.25it/s] 42%|████▏ | 21/50 [00:03<00:04, 6.23it/s] 44%|████▍ | 22/50 [00:03<00:04, 6.14it/s] 46%|████▌ | 23/50 [00:03<00:04, 6.09it/s] 48%|████▊ | 24/50 [00:03<00:04, 6.09it/s] 50%|█████ | 25/50 [00:03<00:04, 6.08it/s] 52%|█████▏ | 26/50 [00:04<00:03, 6.08it/s] 54%|█████▍ | 27/50 [00:04<00:03, 6.04it/s] 56%|█████▌ | 28/50 [00:04<00:03, 6.01it/s] 58%|█████▊ | 29/50 [00:04<00:03, 5.99it/s] 60%|██████ | 30/50 [00:04<00:03, 5.98it/s] 62%|██████▏ | 31/50 [00:04<00:03, 5.96it/s] 64%|██████▍ | 32/50 [00:05<00:03, 6.00it/s] 66%|██████▌ | 33/50 [00:05<00:02, 6.01it/s] 68%|██████▊ | 34/50 [00:05<00:02, 5.99it/s] 70%|███████ | 35/50 [00:05<00:02, 5.98it/s] 72%|███████▏ | 36/50 [00:05<00:02, 6.01it/s] 74%|███████▍ | 37/50 [00:05<00:02, 6.00it/s] 76%|███████▌ | 38/50 [00:06<00:02, 5.98it/s] 78%|███████▊ | 39/50 [00:06<00:01, 6.00it/s] 80%|████████ | 40/50 [00:06<00:01, 5.96it/s] 82%|████████▏ | 41/50 [00:06<00:01, 5.99it/s] 84%|████████▍ | 42/50 [00:06<00:01, 5.98it/s] 86%|████████▌ | 43/50 [00:06<00:01, 5.97it/s] 88%|████████▊ | 44/50 [00:07<00:01, 5.96it/s] 90%|█████████ | 45/50 [00:07<00:00, 6.04it/s] 92%|█████████▏| 46/50 [00:07<00:00, 5.99it/s] 94%|█████████▍| 47/50 [00:07<00:00, 6.01it/s] 96%|█████████▌| 48/50 [00:07<00:00, 6.05it/s] 98%|█████████▊| 49/50 [00:07<00:00, 6.13it/s] 100%|██████████| 50/50 [00:08<00:00, 6.07it/s] 100%|██████████| 50/50 [00:08<00:00, 6.17it/s] anime artwork illustrating a woman, wearing a white t - shirt, blue loose hair, lying in bed at night. created by japanese anime studio. highly emotional. best quality, high resolution torch.Size([1, 77, 2048]) torch.Size([1, 77, 2048]) torch.Size([1, 77, 2048]) torch.Size([1, 1280]) torch.Size([1, 1280]) torch.Size([1, 1280]) torch.Size([1, 4, 96, 96]) torch.Size([2, 6]) 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:06, 7.37it/s] 4%|▍ | 2/50 [00:00<00:06, 7.43it/s] 6%|▌ | 3/50 [00:00<00:06, 7.08it/s] 8%|▊ | 4/50 [00:00<00:07, 6.56it/s] 10%|█ | 5/50 [00:00<00:07, 6.36it/s] 12%|█▏ | 6/50 [00:00<00:06, 6.37it/s] 14%|█▍ | 7/50 [00:01<00:06, 6.34it/s] 16%|█▌ | 8/50 [00:01<00:06, 6.24it/s] 18%|█▊ | 9/50 [00:01<00:06, 6.29it/s] 20%|██ | 10/50 [00:01<00:06, 6.34it/s] 22%|██▏ | 11/50 [00:01<00:06, 6.38it/s] 24%|██▍ | 12/50 [00:01<00:05, 6.36it/s] 26%|██▌ | 13/50 [00:02<00:05, 6.32it/s] 28%|██▊ | 14/50 [00:02<00:05, 6.27it/s] 30%|███ | 15/50 [00:02<00:05, 6.35it/s] 32%|███▏ | 16/50 [00:02<00:05, 6.31it/s] 34%|███▍ | 17/50 [00:02<00:05, 6.22it/s] 36%|███▌ | 18/50 [00:02<00:05, 6.14it/s] 38%|███▊ | 19/50 [00:02<00:05, 6.20it/s] 40%|████ | 20/50 [00:03<00:04, 6.23it/s] 42%|████▏ | 21/50 [00:03<00:04, 6.19it/s] 44%|████▍ | 22/50 [00:03<00:04, 6.13it/s] 46%|████▌ | 23/50 [00:03<00:04, 6.10it/s] 48%|████▊ | 24/50 [00:03<00:04, 6.07it/s] 50%|█████ | 25/50 [00:03<00:04, 6.07it/s] 52%|█████▏ | 26/50 [00:04<00:03, 6.06it/s] 54%|█████▍ | 27/50 [00:04<00:03, 5.99it/s] 56%|█████▌ | 28/50 [00:04<00:03, 5.99it/s] 58%|█████▊ | 29/50 [00:04<00:03, 5.98it/s] 60%|██████ | 30/50 [00:04<00:03, 5.95it/s] 62%|██████▏ | 31/50 [00:04<00:03, 5.94it/s] 64%|██████▍ | 32/50 [00:05<00:03, 5.97it/s] 66%|██████▌ | 33/50 [00:05<00:02, 5.97it/s] 68%|██████▊ | 34/50 [00:05<00:02, 5.98it/s] 70%|███████ | 35/50 [00:05<00:02, 5.97it/s] 72%|███████▏ | 36/50 [00:05<00:02, 5.99it/s] 74%|███████▍ | 37/50 [00:05<00:02, 5.98it/s] 76%|███████▌ | 38/50 [00:06<00:02, 5.95it/s] 78%|███████▊ | 39/50 [00:06<00:01, 5.96it/s] 80%|████████ | 40/50 [00:06<00:01, 5.93it/s] 82%|████████▏ | 41/50 [00:06<00:01, 5.97it/s] 84%|████████▍ | 42/50 [00:06<00:01, 5.94it/s] 86%|████████▌ | 43/50 [00:07<00:01, 5.88it/s] 88%|████████▊ | 44/50 [00:07<00:01, 5.92it/s] 90%|█████████ | 45/50 [00:07<00:00, 5.99it/s] 92%|█████████▏| 46/50 [00:07<00:00, 5.97it/s] 94%|█████████▍| 47/50 [00:07<00:00, 6.00it/s] 96%|█████████▌| 48/50 [00:07<00:00, 6.03it/s] 98%|█████████▊| 49/50 [00:07<00:00, 6.10it/s] 100%|██████████| 50/50 [00:08<00:00, 6.07it/s] 100%|██████████| 50/50 [00:08<00:00, 6.13it/s] 2 [[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAB60>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAF50>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAA70>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CBC40>]] -2 [[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAB60>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAF50>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAA70>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CBC40>], [<PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CACE0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAD10>]] [[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAB60>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAF50>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAA70>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CBC40>], [<PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CACE0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F59B44CAD10>, <PIL.Image.Image image mode=RGBA size=788x788 at 0x7F59B44CAC50>, <PIL.Image.Image image mode=RGBA size=788x788 at 0x7F59B44CAC50>]] 0 (214, 718) 0 (201, 717) 0 (1, 717) Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference. Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference. Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference. successsfully load paired self-attention number of the processor : 36
Version Details
- Version ID
a43c7e0e4bce75ee98445b20b244240d1109e30a46bf7719958fd0a69ab29e8e- Version Created
- May 3, 2024