zsxkib/instant-id 🔢🖼️📝❓✓ → 🖼️

▶️ 983.3K runs 📅 Jan 2024 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License

image-consistent-character-generation image-to-image

About

Make realistic images of real people instantly

Example Output

Prompt:

"analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality"

Output

Performance Metrics

36.02s Prediction Time

36.06s Total Time

All Input Parameters

{
  "image": "https://replicate.delivery/pbxt/KIIutO7jIleskKaWebhvurgBUlHR6M6KN7KHaMMWSt4OnVrF/musk_resize.jpeg",
  "prompt": "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality",
  "scheduler": "EulerDiscreteScheduler",
  "enable_lcm": false,
  "pose_image": "https://replicate.delivery/pbxt/KJmFdQRQVDXGDVdVXftLvFrrvgOPXXRXbzIVEyExPYYOFPyF/80048a6e6586759dbcb529e74a9042ca.jpeg",
  "num_outputs": 1,
  "sdxl_weights": "protovision-xl-high-fidel",
  "output_format": "webp",
  "pose_strength": 0.4,
  "canny_strength": 0.3,
  "depth_strength": 0.5,
  "guidance_scale": 5,
  "output_quality": 80,
  "negative_prompt": "(lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured (lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch,deformed, mutated, cross-eyed, ugly, disfigured",
  "ip_adapter_scale": 0.8,
  "lcm_guidance_scale": 1.5,
  "num_inference_steps": 30,
  "enable_pose_controlnet": true,
  "enhance_nonface_region": true,
  "enable_canny_controlnet": false,
  "enable_depth_controlnet": false,
  "lcm_num_inference_steps": 5,
  "face_detection_input_width": 640,
  "face_detection_input_height": 640,
  "controlnet_conditioning_scale": 0.8
}

Input Parameters

seed Type: integer: Random seed. Leave blank to randomize the seed
image (required) Type: string: Input face image
prompt Type: stringDefault: a person: Input prompt
scheduler Default: EulerDiscreteScheduler: Scheduler
enable_lcm Type: booleanDefault: false: Enable Fast Inference with LCM (Latent Consistency Models) - speeds up inference steps, trade-off is the quality of the generated image. Performs better with close-up portrait face images
pose_image Type: string: (Optional) reference pose image
num_outputs Type: integerDefault: 1Range: 1 - 8: Number of images to output
sdxl_weights Default: stable-diffusion-xl-base-1.0: Pick which base weights you want to use
output_format Default: webp: Format of the output images
pose_strength Type: numberDefault: 0.4Range: 0 - 1: Openpose ControlNet strength, effective only if `enable_pose_controlnet` is true
canny_strength Type: numberDefault: 0.3Range: 0 - 1: Canny ControlNet strength, effective only if `enable_canny_controlnet` is true
depth_strength Type: numberDefault: 0.5Range: 0 - 1: Depth ControlNet strength, effective only if `enable_depth_controlnet` is true
guidance_scale Type: numberDefault: 7.5Range: 1 - 50: Scale for classifier-free guidance
output_quality Type: integerDefault: 80Range: 0 - 100: Quality of the output images, from 0 to 100. 100 is best quality, 0 is lowest quality.
negative_prompt Type: stringDefault:: Input Negative Prompt
ip_adapter_scale Type: numberDefault: 0.8Range: 0 - 1.5: Scale for image adapter strength (for detail)
lcm_guidance_scale Type: numberDefault: 1.5Range: 1 - 20: Only used when `enable_lcm` is set to True, Scale for classifier-free guidance when using LCM
num_inference_steps Type: integerDefault: 30Range: 1 - 500: Number of denoising steps
disable_safety_checker Type: booleanDefault: false: Disable safety checker for generated images
enable_pose_controlnet Type: booleanDefault: true: Enable Openpose ControlNet, overrides strength if set to false
enhance_nonface_region Type: booleanDefault: true: Enhance non-face region
enable_canny_controlnet Type: booleanDefault: false: Enable Canny ControlNet, overrides strength if set to false
enable_depth_controlnet Type: booleanDefault: false: Enable Depth ControlNet, overrides strength if set to false
lcm_num_inference_steps Type: integerDefault: 5Range: 1 - 10: Only used when `enable_lcm` is set to True, Number of denoising steps when using LCM
face_detection_input_width Type: integerDefault: 640Range: 640 - 4096: Width of the input image for face detection
face_detection_input_height Type: integerDefault: 640Range: 640 - 4096: Height of the input image for face detection
controlnet_conditioning_scale Type: numberDefault: 0.8Range: 0 - 1.5: Scale for IdentityNet strength (for fidelity)

Output Schema

Output

Type: array • Items Type: string • Items Format: uri

Example Execution Logs

Using seed: 25815
[~] Loading new SDXL weights: checkpoints/models--stablediffusionapi--protovision-xl-high-fidel/
Keyword arguments {'safety_checker': None} are not expected by StableDiffusionXLInstantIDPipeline and will be ignored.
Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]
Loading pipeline components...:  29%|██▊       | 2/7 [00:00<00:00,  5.96it/s]
Loading pipeline components...:  43%|████▎     | 3/7 [00:00<00:01,  3.94it/s]
Loading pipeline components...:  71%|███████▏  | 5/7 [00:04<00:02,  1.15s/it]
Loading pipeline components...: 100%|██████████| 7/7 [00:04<00:00,  1.48it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:04<00:00,  1.50it/s]
[~] Seting up LCM (just in case)
Start inference...
[Debug] Prompt: analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality,
[Debug] Neg Prompt: (lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured (lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch,deformed, mutated, cross-eyed, ugly, disfigured
  0%|          | 0/30 [00:00<?, ?it/s]
  3%|▎         | 1/30 [00:00<00:07,  4.12it/s]
  7%|▋         | 2/30 [00:00<00:06,  4.12it/s]
 10%|█         | 3/30 [00:00<00:06,  4.13it/s]
 13%|█▎        | 4/30 [00:00<00:06,  4.12it/s]
 17%|█▋        | 5/30 [00:01<00:06,  4.12it/s]
 20%|██        | 6/30 [00:01<00:05,  4.12it/s]
 23%|██▎       | 7/30 [00:01<00:05,  4.12it/s]
 27%|██▋       | 8/30 [00:01<00:05,  4.12it/s]
 30%|███       | 9/30 [00:02<00:05,  4.12it/s]
 33%|███▎      | 10/30 [00:02<00:04,  4.12it/s]
 37%|███▋      | 11/30 [00:02<00:04,  4.12it/s]
 40%|████      | 12/30 [00:02<00:04,  4.12it/s]
 43%|████▎     | 13/30 [00:03<00:04,  4.12it/s]
 47%|████▋     | 14/30 [00:03<00:03,  4.12it/s]
 50%|█████     | 15/30 [00:03<00:03,  4.12it/s]
 53%|█████▎    | 16/30 [00:03<00:03,  4.12it/s]
 57%|█████▋    | 17/30 [00:04<00:03,  4.12it/s]
 60%|██████    | 18/30 [00:04<00:02,  4.12it/s]
 63%|██████▎   | 19/30 [00:04<00:02,  4.11it/s]
 67%|██████▋   | 20/30 [00:04<00:02,  4.11it/s]
 70%|███████   | 21/30 [00:05<00:02,  4.10it/s]
 73%|███████▎  | 22/30 [00:05<00:01,  4.10it/s]
 77%|███████▋  | 23/30 [00:05<00:01,  4.10it/s]
 80%|████████  | 24/30 [00:05<00:01,  4.09it/s]
 83%|████████▎ | 25/30 [00:06<00:01,  4.10it/s]
 87%|████████▋ | 26/30 [00:06<00:00,  4.10it/s]
 90%|█████████ | 27/30 [00:06<00:00,  4.10it/s]
 93%|█████████▎| 28/30 [00:06<00:00,  4.09it/s]
 97%|█████████▋| 29/30 [00:07<00:00,  4.09it/s]
100%|██████████| 30/30 [00:07<00:00,  4.09it/s]
100%|██████████| 30/30 [00:07<00:00,  4.11it/s]
NSFW content detected: False
[~] Saving to /tmp/out_0.webp...
[~] Output format: WEBP
[~] Output quality: 80

Version Details

Version ID: 2e4785a4d80dadf580077b2244c8d7c05d8e3faac04a04c02d8e099dd2876789
Version Created: December 11, 2024

Run on Replicate →