jyoung105/stable-cascade πŸ”’πŸ“ β†’ πŸ–ΌοΈ

▢️ 81 runs πŸ“… Sep 2024 βš™οΈ Cog 0.13.2 πŸ”— GitHub πŸ“„ Paper βš–οΈ License
text-to-image

About

WΓΌrstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

Example Output

Prompt:

"A man with hoodie on, illustration"

Output

Example output

Performance Metrics

14.02s Prediction Time
139.58s Total Time
All Input Parameters
{
  "width": 1024,
  "height": 1024,
  "prompt": "A man with hoodie on, illustration",
  "num_images": 1,
  "steps_prior": 20,
  "steps_decoder": 10,
  "guidance_scale_prior": 4,
  "guidance_scale_decoder": 0
}
Input Parameters
seed Type: integer
Random seed. Leave blank to randomize the seed.
width Type: integerDefault: 1024Range: 1 - 2048
Width of the output image.
height Type: integerDefault: 1024Range: 1 - 2048
Height of the output image.
prompt Type: string
Input prompt, text of what you want to generate.
num_images Type: integerDefault: 1Range: 1 - 4
Number of output images.
steps_prior Type: integerDefault: 20Range: 1 - 50
Number of denoising steps in prior.
steps_decoder Type: integerDefault: 10Range: 1 - 50
Number of denoising steps in decoder.
negative_prompt Type: string
Input negative prompt, text of what you don't want to generate.
guidance_scale_prior Type: numberDefault: 4Range: 0 - 20
Scale for classifier-free guidance in prior.
guidance_scale_decoder Type: numberDefault: 0Range: 0 - 20
Scale for classifier-free guidance in decoder.
Output Schema

Output

Type: array β€’ Items Type: string β€’ Items Format: uri

Example Execution Logs
DEVICE: cuda
DTYPE: torch.float16
Using seed: 50236
Finish setup in 0.00011014938354492188 secs.
[Debug] Prompt: A man with hoodie on, illustration, best quality, high detail, sharp focus
  0%|          | 0/20 [00:00<?, ?it/s]
  5%|β–Œ         | 1/20 [00:01<00:29,  1.54s/it]
 10%|β–ˆ         | 2/20 [00:01<00:12,  1.40it/s]
 15%|β–ˆβ–Œ        | 3/20 [00:01<00:07,  2.21it/s]
 20%|β–ˆβ–ˆ        | 4/20 [00:01<00:05,  3.04it/s]
 25%|β–ˆβ–ˆβ–Œ       | 5/20 [00:02<00:03,  3.83it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 6/20 [00:02<00:03,  4.54it/s]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 7/20 [00:02<00:02,  5.14it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 8/20 [00:02<00:02,  5.63it/s]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 9/20 [00:02<00:01,  6.03it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 10/20 [00:02<00:01,  6.33it/s]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 11/20 [00:02<00:01,  6.54it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 12/20 [00:03<00:01,  6.70it/s]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 13/20 [00:03<00:01,  6.78it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 14/20 [00:03<00:00,  6.89it/s]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 15/20 [00:03<00:00,  6.94it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 16/20 [00:03<00:00,  6.97it/s]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 17/20 [00:03<00:00,  7.00it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 18/20 [00:03<00:00,  7.03it/s]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 19/20 [00:04<00:00,  7.06it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20/20 [00:04<00:00,  7.07it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20/20 [00:04<00:00,  4.75it/s]
  0%|          | 0/10 [00:00<?, ?it/s]
 10%|β–ˆ         | 1/10 [00:01<00:17,  1.93s/it]
 20%|β–ˆβ–ˆ        | 2/10 [00:02<00:07,  1.12it/s]
 30%|β–ˆβ–ˆβ–ˆ       | 3/10 [00:02<00:03,  1.79it/s]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 4/10 [00:02<00:02,  2.48it/s]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 5/10 [00:02<00:01,  3.17it/s]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 6/10 [00:02<00:01,  3.80it/s]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 7/10 [00:02<00:00,  4.34it/s]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 8/10 [00:03<00:00,  4.80it/s]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [00:03<00:00,  5.16it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:03<00:00,  5.43it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:03<00:00,  2.95it/s]
Finish generation in 12.288098096847534 secs.
Version Details
Version ID
36c40b9cc271abb8bc4a0f8cbb59c68d0739ad076648533eac1ca7a56d268b0d
Version Created
November 22, 2024
Run on Replicate β†’