zsxkib/instant-id 🔢🖼️📝❓✓ → 🖼️

▶️ 983.3K runs 📅 Jan 2024 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License
image-consistent-character-generation image-to-image

About

Make realistic images of real people instantly

Example Output

Prompt:

"analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality"

Output

Example output

Performance Metrics

36.02s Prediction Time
36.06s Total Time
All Input Parameters
{
  "image": "https://replicate.delivery/pbxt/KIIutO7jIleskKaWebhvurgBUlHR6M6KN7KHaMMWSt4OnVrF/musk_resize.jpeg",
  "prompt": "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality",
  "scheduler": "EulerDiscreteScheduler",
  "enable_lcm": false,
  "pose_image": "https://replicate.delivery/pbxt/KJmFdQRQVDXGDVdVXftLvFrrvgOPXXRXbzIVEyExPYYOFPyF/80048a6e6586759dbcb529e74a9042ca.jpeg",
  "num_outputs": 1,
  "sdxl_weights": "protovision-xl-high-fidel",
  "output_format": "webp",
  "pose_strength": 0.4,
  "canny_strength": 0.3,
  "depth_strength": 0.5,
  "guidance_scale": 5,
  "output_quality": 80,
  "negative_prompt": "(lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured (lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch,deformed, mutated, cross-eyed, ugly, disfigured",
  "ip_adapter_scale": 0.8,
  "lcm_guidance_scale": 1.5,
  "num_inference_steps": 30,
  "enable_pose_controlnet": true,
  "enhance_nonface_region": true,
  "enable_canny_controlnet": false,
  "enable_depth_controlnet": false,
  "lcm_num_inference_steps": 5,
  "face_detection_input_width": 640,
  "face_detection_input_height": 640,
  "controlnet_conditioning_scale": 0.8
}
Input Parameters
seed Type: integer
Random seed. Leave blank to randomize the seed
image (required) Type: string
Input face image
prompt Type: stringDefault: a person
Input prompt
scheduler Default: EulerDiscreteScheduler
Scheduler
enable_lcm Type: booleanDefault: false
Enable Fast Inference with LCM (Latent Consistency Models) - speeds up inference steps, trade-off is the quality of the generated image. Performs better with close-up portrait face images
pose_image Type: string
(Optional) reference pose image
num_outputs Type: integerDefault: 1Range: 1 - 8
Number of images to output
sdxl_weights Default: stable-diffusion-xl-base-1.0
Pick which base weights you want to use
output_format Default: webp
Format of the output images
pose_strength Type: numberDefault: 0.4Range: 0 - 1
Openpose ControlNet strength, effective only if `enable_pose_controlnet` is true
canny_strength Type: numberDefault: 0.3Range: 0 - 1
Canny ControlNet strength, effective only if `enable_canny_controlnet` is true
depth_strength Type: numberDefault: 0.5Range: 0 - 1
Depth ControlNet strength, effective only if `enable_depth_controlnet` is true
guidance_scale Type: numberDefault: 7.5Range: 1 - 50
Scale for classifier-free guidance
output_quality Type: integerDefault: 80Range: 0 - 100
Quality of the output images, from 0 to 100. 100 is best quality, 0 is lowest quality.
negative_prompt Type: stringDefault:
Input Negative Prompt
ip_adapter_scale Type: numberDefault: 0.8Range: 0 - 1.5
Scale for image adapter strength (for detail)
lcm_guidance_scale Type: numberDefault: 1.5Range: 1 - 20
Only used when `enable_lcm` is set to True, Scale for classifier-free guidance when using LCM
num_inference_steps Type: integerDefault: 30Range: 1 - 500
Number of denoising steps
disable_safety_checker Type: booleanDefault: false
Disable safety checker for generated images
enable_pose_controlnet Type: booleanDefault: true
Enable Openpose ControlNet, overrides strength if set to false
enhance_nonface_region Type: booleanDefault: true
Enhance non-face region
enable_canny_controlnet Type: booleanDefault: false
Enable Canny ControlNet, overrides strength if set to false
enable_depth_controlnet Type: booleanDefault: false
Enable Depth ControlNet, overrides strength if set to false
lcm_num_inference_steps Type: integerDefault: 5Range: 1 - 10
Only used when `enable_lcm` is set to True, Number of denoising steps when using LCM
face_detection_input_width Type: integerDefault: 640Range: 640 - 4096
Width of the input image for face detection
face_detection_input_height Type: integerDefault: 640Range: 640 - 4096
Height of the input image for face detection
controlnet_conditioning_scale Type: numberDefault: 0.8Range: 0 - 1.5
Scale for IdentityNet strength (for fidelity)
Output Schema

Output

Type: arrayItems Type: stringItems Format: uri

Example Execution Logs
Using seed: 25815
[~] Loading new SDXL weights: checkpoints/models--stablediffusionapi--protovision-xl-high-fidel/
Keyword arguments {'safety_checker': None} are not expected by StableDiffusionXLInstantIDPipeline and will be ignored.
Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]
Loading pipeline components...:  29%|██▊       | 2/7 [00:00<00:00,  5.96it/s]
Loading pipeline components...:  43%|████▎     | 3/7 [00:00<00:01,  3.94it/s]
Loading pipeline components...:  71%|███████▏  | 5/7 [00:04<00:02,  1.15s/it]
Loading pipeline components...: 100%|██████████| 7/7 [00:04<00:00,  1.48it/s]
Loading pipeline components...: 100%|██████████| 7/7 [00:04<00:00,  1.50it/s]
[~] Seting up LCM (just in case)
Start inference...
[Debug] Prompt: analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality,
[Debug] Neg Prompt: (lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured (lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch,deformed, mutated, cross-eyed, ugly, disfigured
  0%|          | 0/30 [00:00<?, ?it/s]
  3%|▎         | 1/30 [00:00<00:07,  4.12it/s]
  7%|▋         | 2/30 [00:00<00:06,  4.12it/s]
 10%|█         | 3/30 [00:00<00:06,  4.13it/s]
 13%|█▎        | 4/30 [00:00<00:06,  4.12it/s]
 17%|█▋        | 5/30 [00:01<00:06,  4.12it/s]
 20%|██        | 6/30 [00:01<00:05,  4.12it/s]
 23%|██▎       | 7/30 [00:01<00:05,  4.12it/s]
 27%|██▋       | 8/30 [00:01<00:05,  4.12it/s]
 30%|███       | 9/30 [00:02<00:05,  4.12it/s]
 33%|███▎      | 10/30 [00:02<00:04,  4.12it/s]
 37%|███▋      | 11/30 [00:02<00:04,  4.12it/s]
 40%|████      | 12/30 [00:02<00:04,  4.12it/s]
 43%|████▎     | 13/30 [00:03<00:04,  4.12it/s]
 47%|████▋     | 14/30 [00:03<00:03,  4.12it/s]
 50%|█████     | 15/30 [00:03<00:03,  4.12it/s]
 53%|█████▎    | 16/30 [00:03<00:03,  4.12it/s]
 57%|█████▋    | 17/30 [00:04<00:03,  4.12it/s]
 60%|██████    | 18/30 [00:04<00:02,  4.12it/s]
 63%|██████▎   | 19/30 [00:04<00:02,  4.11it/s]
 67%|██████▋   | 20/30 [00:04<00:02,  4.11it/s]
 70%|███████   | 21/30 [00:05<00:02,  4.10it/s]
 73%|███████▎  | 22/30 [00:05<00:01,  4.10it/s]
 77%|███████▋  | 23/30 [00:05<00:01,  4.10it/s]
 80%|████████  | 24/30 [00:05<00:01,  4.09it/s]
 83%|████████▎ | 25/30 [00:06<00:01,  4.10it/s]
 87%|████████▋ | 26/30 [00:06<00:00,  4.10it/s]
 90%|█████████ | 27/30 [00:06<00:00,  4.10it/s]
 93%|█████████▎| 28/30 [00:06<00:00,  4.09it/s]
 97%|█████████▋| 29/30 [00:07<00:00,  4.09it/s]
100%|██████████| 30/30 [00:07<00:00,  4.09it/s]
100%|██████████| 30/30 [00:07<00:00,  4.11it/s]
NSFW content detected: False
[~] Saving to /tmp/out_0.webp...
[~] Output format: WEBP
[~] Output quality: 80
Version Details
Version ID
2e4785a4d80dadf580077b2244c8d7c05d8e3faac04a04c02d8e099dd2876789
Version Created
December 11, 2024
Run on Replicate →