joetm/camerabooth-openpose-style 📝🔢❓🖼️✓ → 🖼️

▶️ 478 runs 📅 Sep 2023 ⚙️ Cog 0.8.6
image-style-transfer image-to-image openpose

About

StableDiffusion 1.4 + T2IAdapter (ControlNet) with style and openpose adapters + two upscaling passes with Real-ESRGAN

Example Output

Output

Example outputExample output

Performance Metrics

45.64s Prediction Time
45.67s Total Time
All Input Parameters
{
  "code": "",
  "seed": 0,
  "booth": "public",
  "image": "https://replicate.delivery/pbxt/JdXiguR2wRjl03jOe4fLpfrEEewitrSFTStYxiBpM6Qa6Xqm/oliver-ragfelt-m79taQSsQIQ-unsplash.jpg",
  "steps": 50,
  "prompt": "",
  "artwork": "https://replicate.delivery/pbxt/JdXihCVozhMdqBwECWdGRkbbgib7elGQnmJcheaupHfgKTIt/child-with-dove-1901.jpg",
  "cond_tau": 1,
  "clip_mode": "best",
  "n_samples": 1,
  "timestamp": 0,
  "add_prompt": "",
  "neg_prompt": "(((nsfw))), (((text))), (((words))), ((low quality)), worst quality, bad quality, ((bad art)), lowres, ((disfigured)), ((deformed)), ((mutilated)), glitch, ((distorted)), malformed, mutated, (((disfigured))), misaligned, poorly drawn, (((blurry))), ((blurred)), mutated, bad arms, ((extra limbs)), missing arms, missing legs, disconnected limbs, bad hands, ((poorly drawn hands)), ((poorly drawn face)), deformed hands, ((extra fingers)), ((extra legs)), fused fingers, (too many fingers), mutated hands, amputated limbs, no arms, extra arms, multiple arms, more than two legs, long neck, bad proportions, longbody, bad anatomy, missing fingers, extra digit, fewer digits, deformed eyes, poorly drawn eyes, cross-eye, ((cross-eyed)), bad skin, multiple, duplicated, by Bad Artist, monochrome, monotone, grayscale, b&w, sketches, speech bubble, signature, watermark, border, logo, ((morbid)), canvas frame, frame, 3d",
  "face_enhance": true,
  "style_weight": 1,
  "use_upscaler": true,
  "detect_prompt": true,
  "guidance_scale": 8,
  "style_cond_tau": 0.8,
  "upscale_factor": 2,
  "openpose_weight": 1
}
Input Parameters
code Type: stringDefault:
The keyphrase given to participants. Autofilled - leave blank.
seed Type: integerDefault: 0
Random seed (optional). Set to 0 to randomize the seed
booth Default: public
The booth version (public or private). Autofilled - leave blank.
image (required) Type: string
Input pose image
steps Type: integerDefault: 50Range: 1 - 100
Number of denoising steps. Default: 50.
prompt Type: stringDefault:
Input prompt (optional). This field is ignored if detect_prompt = True.
artwork (required) Type: string
Input artwork image
cond_tau Type: numberDefault: 1Range: 0 - 1
Conditioning tau. Default: 1.0.
clip_mode Default: best
CLIP Interrogator prompt mode (best takes 10-20 seconds, fast takes 1-2 seconds). Default: best. The field is ignored if detect_prompt = False.
n_samples Type: integerDefault: 1Range: 1 - 4
Number of images to generate. Default: 1.
timestamp Type: integerDefault: 0
Unix timestamp. Autofilled - leave blank.
add_prompt Type: stringDefault:
Additional prompt (optional), appended to the prompt
neg_prompt Type: stringDefault: (((nsfw))), (((text))), (((words))), ((low quality)), worst quality, bad quality, ((bad art)), lowres, ((disfigured)), ((deformed)), ((mutilated)), glitch, ((distorted)), malformed, mutated, (((disfigured))), misaligned, poorly drawn, (((blurry))), ((blurred)), mutated, bad arms, ((extra limbs)), missing arms, missing legs, disconnected limbs, bad hands, ((poorly drawn hands)), ((poorly drawn face)), deformed hands, ((extra fingers)), ((extra legs)), fused fingers, (too many fingers), mutated hands, amputated limbs, no arms, extra arms, multiple arms, more than two legs, long neck, bad proportions, longbody, bad anatomy, missing fingers, extra digit, fewer digits, deformed eyes, poorly drawn eyes, cross-eye, ((cross-eyed)), bad skin, multiple, duplicated, by Bad Artist, monochrome, monotone, grayscale, b&w, sketches, speech bubble, signature, watermark, border, logo, ((morbid)), canvas frame, frame, 3d
Negative prompt (optional).
face_enhance Type: booleanDefault: true
Run GFPGAN face enhancement along with upscaling
style_weight Type: numberDefault: 1Range: 0 - 1
Weight of the style adapter. Default: 1.0.
use_upscaler Type: booleanDefault: true
Upscale the resulting image(s) by upscale_factor?
detect_prompt Type: booleanDefault: true
Use CLIP-Interrogator to detect prompt from the artwork. Default: True.
guidance_scale Type: numberDefault: 8Range: 1 - 50
Scale for classifier-free guidance. Default: 7.5.
style_cond_tau Type: numberDefault: 0.8Range: 0 - 1
Style conditioning tau. Default: 0.8.
upscale_factor Type: numberDefault: 2Range: 0 - 4
Factor to scale image by
openpose_weight Type: numberDefault: 1Range: 0 - 1
Weight of the openpose adapter. Default: 1.0.
Output Schema

Output

Type: arrayItems Type: stringItems Format: uri

Example Execution Logs
Using negative prompt: (((nsfw))), (((text))), (((words))), ((low quality)), worst quality, bad quality, ((bad art)), lowres, ((disfigured)), ((deformed)), ((mutilated)), glitch, ((distorted)), malformed, mutated, (((disfigured))), misaligned, poorly drawn, (((blurry))), ((blurred)), mutated, bad arms, ((extra limbs)), missing arms, missing legs, disconnected limbs, bad hands, ((poorly drawn hands)), ((poorly drawn face)), deformed hands, ((extra fingers)), ((extra legs)), fused fingers, (too many fingers), mutated hands, amputated limbs, no arms, extra arms, multiple arms, more than two legs, long neck, bad proportions, longbody, bad anatomy, missing fingers, extra digit, fewer digits, deformed eyes, poorly drawn eyes, cross-eye, ((cross-eyed)), bad skin, multiple, duplicated, by Bad Artist, monochrome, monotone, grayscale, b&w, sketches, speech bubble, signature, watermark, border, logo, ((morbid)), canvas frame, frame, 3d
Detecting prompt with CI (ViT-L-14/openai)...
Global seed set to 12087
Loading caption model blip-large...
Loading CLIP model ViT-L-14/openai...
Loaded CLIP model and data in 3.96 seconds.
  0%|          | 0/55 [00:00<?, ?it/s]
 80%|████████  | 44/55 [00:00<00:00, 431.30it/s]
100%|██████████| 55/55 [00:00<00:00, 436.43it/s]
Flavor chain:   0%|          | 0/32 [00:00<?, ?it/s]
Flavor chain:   3%|▎         | 1/32 [00:00<00:25,  1.23it/s]
Flavor chain:   6%|▋         | 2/32 [00:01<00:24,  1.21it/s]
Flavor chain:   9%|▉         | 3/32 [00:02<00:24,  1.19it/s]
Flavor chain:  12%|█▎        | 4/32 [00:03<00:23,  1.18it/s]
Flavor chain:  16%|█▌        | 5/32 [00:04<00:23,  1.17it/s]
Flavor chain:  19%|█▉        | 6/32 [00:05<00:22,  1.15it/s]
Flavor chain:  22%|██▏       | 7/32 [00:06<00:21,  1.14it/s]
Flavor chain:  25%|██▌       | 8/32 [00:06<00:21,  1.13it/s]
Flavor chain:  28%|██▊       | 9/32 [00:07<00:20,  1.11it/s]
Flavor chain:  31%|███▏      | 10/32 [00:08<00:20,  1.09it/s]
Flavor chain:  34%|███▍      | 11/32 [00:09<00:19,  1.07it/s]
Flavor chain:  38%|███▊      | 12/32 [00:10<00:18,  1.06it/s]
Flavor chain:  41%|████      | 13/32 [00:11<00:18,  1.04it/s]
Flavor chain:  44%|████▍     | 14/32 [00:12<00:17,  1.02it/s]
Flavor chain:  44%|████▍     | 14/32 [00:13<00:17,  1.01it/s]
  0%|          | 0/55 [00:00<?, ?it/s]
 78%|███████▊  | 43/55 [00:00<00:00, 429.39it/s]
100%|██████████| 55/55 [00:00<00:00, 434.66it/s]
  0%|          | 0/6 [00:00<?, ?it/s]
100%|██████████| 6/6 [00:00<00:00, 317.31it/s]
  0%|          | 0/50 [00:00<?, ?it/s]
 90%|█████████ | 45/50 [00:00<00:00, 447.75it/s]
100%|██████████| 50/50 [00:00<00:00, 452.42it/s]
Prompt: painting of a child holding a bird in a field with a ball, style of picasso, white dove, macaron, midday photograph, platypus, listing image, nagasaki, guggimon, by Christian Jane Fergusson, head in hands, bluejay, single color, holding a white flag, nouvelle vague
Enabling style adapter.
Enabling openpose adapter.
Data shape for DDIM sampling is (1, 4, 96, 64), eta 0.0
Running DDIM Sampling with 50 timesteps
DDIM Sampler:   0%|          | 0/50 [00:00<?, ?it/s]
DDIM Sampler:   2%|▏         | 1/50 [00:00<00:12,  4.04it/s]
DDIM Sampler:   4%|▍         | 2/50 [00:00<00:11,  4.17it/s]
DDIM Sampler:   6%|▌         | 3/50 [00:00<00:11,  4.23it/s]
DDIM Sampler:   8%|▊         | 4/50 [00:00<00:10,  4.25it/s]
DDIM Sampler:  10%|█         | 5/50 [00:01<00:10,  4.26it/s]
DDIM Sampler:  12%|█▏        | 6/50 [00:01<00:10,  4.27it/s]
DDIM Sampler:  14%|█▍        | 7/50 [00:01<00:10,  4.28it/s]
DDIM Sampler:  16%|█▌        | 8/50 [00:01<00:09,  4.28it/s]
DDIM Sampler:  18%|█▊        | 9/50 [00:02<00:09,  4.29it/s]
DDIM Sampler:  20%|██        | 10/50 [00:02<00:09,  4.28it/s]
DDIM Sampler:  22%|██▏       | 11/50 [00:02<00:09,  4.29it/s]
DDIM Sampler:  24%|██▍       | 12/50 [00:02<00:08,  4.29it/s]
DDIM Sampler:  26%|██▌       | 13/50 [00:03<00:08,  4.29it/s]
DDIM Sampler:  28%|██▊       | 14/50 [00:03<00:08,  4.29it/s]
DDIM Sampler:  30%|███       | 15/50 [00:03<00:08,  4.29it/s]
DDIM Sampler:  32%|███▏      | 16/50 [00:03<00:07,  4.29it/s]
DDIM Sampler:  34%|███▍      | 17/50 [00:03<00:07,  4.29it/s]
DDIM Sampler:  36%|███▌      | 18/50 [00:04<00:07,  4.29it/s]
DDIM Sampler:  38%|███▊      | 19/50 [00:04<00:07,  4.29it/s]
DDIM Sampler:  40%|████      | 20/50 [00:04<00:06,  4.29it/s]
DDIM Sampler:  42%|████▏     | 21/50 [00:04<00:06,  4.29it/s]
DDIM Sampler:  44%|████▍     | 22/50 [00:05<00:06,  4.29it/s]
DDIM Sampler:  46%|████▌     | 23/50 [00:05<00:06,  4.29it/s]
DDIM Sampler:  48%|████▊     | 24/50 [00:05<00:06,  4.28it/s]
DDIM Sampler:  50%|█████     | 25/50 [00:05<00:05,  4.29it/s]
DDIM Sampler:  52%|█████▏    | 26/50 [00:06<00:05,  4.29it/s]
DDIM Sampler:  54%|█████▍    | 27/50 [00:06<00:05,  4.29it/s]
DDIM Sampler:  56%|█████▌    | 28/50 [00:06<00:05,  4.29it/s]
DDIM Sampler:  58%|█████▊    | 29/50 [00:06<00:04,  4.29it/s]
DDIM Sampler:  60%|██████    | 30/50 [00:07<00:04,  4.29it/s]
DDIM Sampler:  62%|██████▏   | 31/50 [00:07<00:04,  4.28it/s]
DDIM Sampler:  64%|██████▍   | 32/50 [00:07<00:04,  4.28it/s]
DDIM Sampler:  66%|██████▌   | 33/50 [00:07<00:03,  4.29it/s]
DDIM Sampler:  68%|██████▊   | 34/50 [00:07<00:03,  4.29it/s]
DDIM Sampler:  70%|███████   | 35/50 [00:08<00:03,  4.29it/s]
DDIM Sampler:  72%|███████▏  | 36/50 [00:08<00:03,  4.29it/s]
DDIM Sampler:  74%|███████▍  | 37/50 [00:08<00:03,  4.29it/s]
DDIM Sampler:  76%|███████▌  | 38/50 [00:08<00:02,  4.29it/s]
DDIM Sampler:  78%|███████▊  | 39/50 [00:09<00:02,  4.29it/s]
DDIM Sampler:  80%|████████  | 40/50 [00:09<00:02,  4.29it/s]
DDIM Sampler:  82%|████████▏ | 41/50 [00:09<00:02,  4.29it/s]
DDIM Sampler:  84%|████████▍ | 42/50 [00:09<00:01,  4.29it/s]
DDIM Sampler:  86%|████████▌ | 43/50 [00:10<00:01,  4.29it/s]
DDIM Sampler:  88%|████████▊ | 44/50 [00:10<00:01,  4.29it/s]
DDIM Sampler:  90%|█████████ | 45/50 [00:10<00:01,  4.29it/s]
DDIM Sampler:  92%|█████████▏| 46/50 [00:10<00:00,  4.29it/s]
DDIM Sampler:  94%|█████████▍| 47/50 [00:10<00:00,  4.29it/s]
DDIM Sampler:  96%|█████████▌| 48/50 [00:11<00:00,  4.29it/s]
DDIM Sampler:  98%|█████████▊| 49/50 [00:11<00:00,  4.30it/s]
DDIM Sampler: 100%|██████████| 50/50 [00:11<00:00,  4.30it/s]
DDIM Sampler: 100%|██████████| 50/50 [00:11<00:00,  4.28it/s]
Generated outputs/image0-steps50-cfg8.0-face_enhance.png
Running upscaler with face enhancement
Image 0, upscaling 1/2, 2.0x
Image 0, upscaling 2/2, 4.0x
Written poseimg to outputs/openpose.png
Done. 2 image(s).
Version Details
Version ID
1833bc8044109a1d7767f6bcffa05e55fbf35eccba8a2f1e9b1be16b9ca56801
Version Created
September 25, 2023
Run on Replicate →