adirik/masactrl-stable-diffusion-v1-4 🔢🖼️📝 → 🖼️

▶️ 2.6K runs 📅 Nov 2023 ⚙️ Cog 0.9.0-beta10 🔗 GitHub 📄 Paper ⚖️ License
image-consistent-character-generation image-editing image-to-image text-to-image

About

Edit real or generated images

Example Output

Output

Example output

Performance Metrics

25.26s Prediction Time
25.30s Total Time
All Input Parameters
{
  "source_image": "https://replicate.delivery/pbxt/JzivdO6ZRyGFooLwygTxvFlaTjctken1O6FxdIbLQGVZFhPN/corgi.jpg",
  "target_prompt": "a photo of a running corgi",
  "guidance_scale": 7.5,
  "masactrl_start_step": 4,
  "num_inference_steps": 50,
  "masactrl_start_layer": 10
}
Input Parameters
seed Type: integer
Random seed.A random seed will be used if not provided.
source_image Type: string
Image to edit for image editing mode.
source_prompt Type: string
Prompt for the first image for consistent image synthesis mode
target_prompt (required) Type: string
If consistent image syhnthesis mode, prompt for the second image. If image editing mode, prompt for the target image.
guidance_scale Type: numberDefault: 7.5Range: 1 - 50
Scale for classifier-free guidance
masactrl_start_step Type: integerDefault: 4Range: 1 - 100
The step to start mutual self-attention control. It should be lower than num_inference_steps
num_inference_steps Type: integerDefault: 50Range: 1 - 100
Number of denoising steps
masactrl_start_layer Type: integerDefault: 10Range: 1 - 16
The layer to start mutual self-attention control
Output Schema

Output

Type: arrayItems Type: stringItems Format: uri

Example Execution Logs
Seed set to 2283628953
input text embeddings : torch.Size([1, 77, 768])
latents shape:  torch.Size([1, 4, 64, 64])
Valid timesteps:  tensor([  1,  21,  41,  61,  81, 101, 121, 141, 161, 181, 201, 221, 241, 261,
281, 301, 321, 341, 361, 381, 401, 421, 441, 461, 481, 501, 521, 541,
561, 581, 601, 621, 641, 661, 681, 701, 721, 741, 761, 781, 801, 821,
841, 861, 881, 901, 921, 941, 961, 981])
DDIM Inversion:   0%|          | 0/50 [00:00<?, ?it/s]
DDIM Inversion:   4%|▍         | 2/50 [00:00<00:04, 10.60it/s]
DDIM Inversion:   8%|▊         | 4/50 [00:00<00:05,  8.22it/s]
DDIM Inversion:  10%|█         | 5/50 [00:00<00:05,  7.85it/s]
DDIM Inversion:  12%|█▏        | 6/50 [00:00<00:05,  7.61it/s]
DDIM Inversion:  14%|█▍        | 7/50 [00:00<00:05,  7.44it/s]
DDIM Inversion:  16%|█▌        | 8/50 [00:01<00:05,  7.32it/s]
DDIM Inversion:  18%|█▊        | 9/50 [00:01<00:05,  7.25it/s]
DDIM Inversion:  20%|██        | 10/50 [00:01<00:05,  7.20it/s]
DDIM Inversion:  22%|██▏       | 11/50 [00:01<00:05,  7.15it/s]
DDIM Inversion:  24%|██▍       | 12/50 [00:01<00:05,  7.13it/s]
DDIM Inversion:  26%|██▌       | 13/50 [00:01<00:05,  7.10it/s]
DDIM Inversion:  28%|██▊       | 14/50 [00:01<00:05,  7.08it/s]
DDIM Inversion:  30%|███       | 15/50 [00:02<00:04,  7.08it/s]
DDIM Inversion:  32%|███▏      | 16/50 [00:02<00:04,  7.07it/s]
DDIM Inversion:  34%|███▍      | 17/50 [00:02<00:04,  7.07it/s]
DDIM Inversion:  36%|███▌      | 18/50 [00:02<00:04,  7.07it/s]
DDIM Inversion:  38%|███▊      | 19/50 [00:02<00:04,  7.07it/s]
DDIM Inversion:  40%|████      | 20/50 [00:02<00:04,  7.07it/s]
DDIM Inversion:  42%|████▏     | 21/50 [00:02<00:04,  7.07it/s]
DDIM Inversion:  44%|████▍     | 22/50 [00:03<00:03,  7.06it/s]
DDIM Inversion:  46%|████▌     | 23/50 [00:03<00:03,  7.06it/s]
DDIM Inversion:  48%|████▊     | 24/50 [00:03<00:03,  7.06it/s]
DDIM Inversion:  50%|█████     | 25/50 [00:03<00:03,  7.06it/s]
DDIM Inversion:  52%|█████▏    | 26/50 [00:03<00:03,  7.06it/s]
DDIM Inversion:  54%|█████▍    | 27/50 [00:03<00:03,  7.06it/s]
DDIM Inversion:  56%|█████▌    | 28/50 [00:03<00:03,  7.06it/s]
DDIM Inversion:  58%|█████▊    | 29/50 [00:04<00:02,  7.06it/s]
DDIM Inversion:  60%|██████    | 30/50 [00:04<00:02,  7.07it/s]
DDIM Inversion:  62%|██████▏   | 31/50 [00:04<00:02,  7.06it/s]
DDIM Inversion:  64%|██████▍   | 32/50 [00:04<00:02,  7.05it/s]
DDIM Inversion:  66%|██████▌   | 33/50 [00:04<00:02,  7.06it/s]
DDIM Inversion:  68%|██████▊   | 34/50 [00:04<00:02,  7.05it/s]
DDIM Inversion:  70%|███████   | 35/50 [00:04<00:02,  7.05it/s]
DDIM Inversion:  72%|███████▏  | 36/50 [00:05<00:01,  7.05it/s]
DDIM Inversion:  74%|███████▍  | 37/50 [00:05<00:01,  7.05it/s]
DDIM Inversion:  76%|███████▌  | 38/50 [00:05<00:01,  7.05it/s]
DDIM Inversion:  78%|███████▊  | 39/50 [00:05<00:01,  7.06it/s]
DDIM Inversion:  80%|████████  | 40/50 [00:05<00:01,  7.06it/s]
DDIM Inversion:  82%|████████▏ | 41/50 [00:05<00:01,  7.07it/s]
DDIM Inversion:  84%|████████▍ | 42/50 [00:05<00:01,  7.07it/s]
DDIM Inversion:  86%|████████▌ | 43/50 [00:05<00:00,  7.07it/s]
DDIM Inversion:  88%|████████▊ | 44/50 [00:06<00:00,  7.06it/s]
DDIM Inversion:  90%|█████████ | 45/50 [00:06<00:00,  7.06it/s]
DDIM Inversion:  92%|█████████▏| 46/50 [00:06<00:00,  7.05it/s]
DDIM Inversion:  94%|█████████▍| 47/50 [00:06<00:00,  7.04it/s]
DDIM Inversion:  96%|█████████▌| 48/50 [00:06<00:00,  7.04it/s]
DDIM Inversion:  98%|█████████▊| 49/50 [00:06<00:00,  7.04it/s]
DDIM Inversion: 100%|██████████| 50/50 [00:06<00:00,  7.05it/s]
DDIM Inversion: 100%|██████████| 50/50 [00:06<00:00,  7.16it/s]
MasaCtrl at denoising steps:  [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
MasaCtrl at U-Net layers:  [10, 11, 12, 13, 14, 15]
input text embeddings : torch.Size([2, 77, 768])
latents shape:  torch.Size([2, 4, 64, 64])
DDIM Sampler:   0%|          | 0/50 [00:00<?, ?it/s]
DDIM Sampler:   4%|▍         | 2/50 [00:00<00:08,  5.63it/s]
DDIM Sampler:   6%|▌         | 3/50 [00:00<00:10,  4.66it/s]
DDIM Sampler:   8%|▊         | 4/50 [00:00<00:10,  4.27it/s]
DDIM Sampler:  10%|█         | 5/50 [00:01<00:11,  4.04it/s]
DDIM Sampler:  12%|█▏        | 6/50 [00:01<00:12,  3.64it/s]
DDIM Sampler:  14%|█▍        | 7/50 [00:01<00:12,  3.43it/s]
DDIM Sampler:  16%|█▌        | 8/50 [00:02<00:12,  3.30it/s]
DDIM Sampler:  18%|█▊        | 9/50 [00:02<00:12,  3.21it/s]
DDIM Sampler:  20%|██        | 10/50 [00:02<00:12,  3.16it/s]
DDIM Sampler:  22%|██▏       | 11/50 [00:03<00:12,  3.12it/s]
DDIM Sampler:  24%|██▍       | 12/50 [00:03<00:12,  3.10it/s]
DDIM Sampler:  26%|██▌       | 13/50 [00:03<00:12,  3.08it/s]
DDIM Sampler:  28%|██▊       | 14/50 [00:04<00:11,  3.07it/s]
DDIM Sampler:  30%|███       | 15/50 [00:04<00:11,  3.06it/s]
DDIM Sampler:  32%|███▏      | 16/50 [00:04<00:11,  3.06it/s]
DDIM Sampler:  34%|███▍      | 17/50 [00:05<00:10,  3.05it/s]
DDIM Sampler:  36%|███▌      | 18/50 [00:05<00:10,  3.05it/s]
DDIM Sampler:  38%|███▊      | 19/50 [00:05<00:10,  3.05it/s]
DDIM Sampler:  40%|████      | 20/50 [00:06<00:09,  3.05it/s]
DDIM Sampler:  42%|████▏     | 21/50 [00:06<00:09,  3.05it/s]
DDIM Sampler:  44%|████▍     | 22/50 [00:06<00:09,  3.05it/s]
DDIM Sampler:  46%|████▌     | 23/50 [00:07<00:08,  3.05it/s]
DDIM Sampler:  48%|████▊     | 24/50 [00:07<00:08,  3.04it/s]
DDIM Sampler:  50%|█████     | 25/50 [00:07<00:08,  3.04it/s]
DDIM Sampler:  52%|█████▏    | 26/50 [00:08<00:07,  3.04it/s]
DDIM Sampler:  54%|█████▍    | 27/50 [00:08<00:07,  3.04it/s]
DDIM Sampler:  56%|█████▌    | 28/50 [00:08<00:07,  3.04it/s]
DDIM Sampler:  58%|█████▊    | 29/50 [00:09<00:06,  3.04it/s]
DDIM Sampler:  60%|██████    | 30/50 [00:09<00:06,  3.04it/s]
DDIM Sampler:  62%|██████▏   | 31/50 [00:09<00:06,  3.04it/s]
DDIM Sampler:  64%|██████▍   | 32/50 [00:10<00:05,  3.04it/s]
DDIM Sampler:  66%|██████▌   | 33/50 [00:10<00:05,  3.04it/s]
DDIM Sampler:  68%|██████▊   | 34/50 [00:10<00:05,  3.04it/s]
DDIM Sampler:  70%|███████   | 35/50 [00:11<00:04,  3.04it/s]
DDIM Sampler:  72%|███████▏  | 36/50 [00:11<00:04,  3.04it/s]
DDIM Sampler:  74%|███████▍  | 37/50 [00:11<00:04,  3.04it/s]
DDIM Sampler:  76%|███████▌  | 38/50 [00:12<00:03,  3.04it/s]
DDIM Sampler:  78%|███████▊  | 39/50 [00:12<00:03,  3.04it/s]
DDIM Sampler:  80%|████████  | 40/50 [00:12<00:03,  3.04it/s]
DDIM Sampler:  82%|████████▏ | 41/50 [00:12<00:02,  3.04it/s]
DDIM Sampler:  84%|████████▍ | 42/50 [00:13<00:02,  3.04it/s]
DDIM Sampler:  86%|████████▌ | 43/50 [00:13<00:02,  3.04it/s]
DDIM Sampler:  88%|████████▊ | 44/50 [00:13<00:01,  3.04it/s]
DDIM Sampler:  90%|█████████ | 45/50 [00:14<00:01,  3.04it/s]
DDIM Sampler:  92%|█████████▏| 46/50 [00:14<00:01,  3.04it/s]
DDIM Sampler:  94%|█████████▍| 47/50 [00:14<00:00,  3.04it/s]
DDIM Sampler:  96%|█████████▌| 48/50 [00:15<00:00,  3.04it/s]
DDIM Sampler:  98%|█████████▊| 49/50 [00:15<00:00,  3.04it/s]
DDIM Sampler: 100%|██████████| 50/50 [00:15<00:00,  3.04it/s]
DDIM Sampler: 100%|██████████| 50/50 [00:15<00:00,  3.13it/s]
Version Details
Version ID
4e86d80ab64a8395e7fd327d34fe85d240a3d9e8706b7144864ba981eba3dfa6
Version Created
December 5, 2023
Run on Replicate →