vectorspacelab/omnigen 🖼️🔢📝✓ → 🖼️

▶️ 12.7K runs 📅 Oct 2024 ⚙️ Cog 0.9.23 🔗 GitHub 📄 Paper ⚖️ License

image-editing image-to-image text-to-image

Performance

125.6sTypical run time

~267sCold start (first call)

12.7KTotal runs

About

OmniGen: Unified Image Generation

Example Output

Prompt:

"<|image_1|> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola."

Output

Performance Metrics

125.61s Prediction Time

266.90s Total Time

All Input Parameters

{
  "img1": "https://shitao-omnigen.hf.space/gradio_api/file=/tmp/gradio/23e02261d0d5dd416702fc5645b5192af904ddd5d83e6953913720c625b40306/t2i_woman_with_book.png",
  "width": 1024,
  "height": 1024,
  "prompt": "<img><|image_1|><img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola.",
  "offload_model": false,
  "guidance_scale": 2.5,
  "inference_steps": 50,
  "img_guidance_scale": 1.6,
  "separate_cfg_infer": true,
  "max_input_image_size": 1024,
  "use_input_image_size_as_output": true
}

Input Parameters

img1 Type: string: Input image 1. Optional
img2 Type: string: Input image 2. Optional
img3 Type: string: Input image 3. Optional
seed Type: integer: Random seed. Leave blank to randomize the seed
width Type: integerDefault: 1024Range: 128 - 2048: Width of the output image
height Type: integerDefault: 1024Range: 128 - 2048: Height of the output image
prompt Type: stringDefault: a photo of an astronaut riding a horse on mars: Input prompt. For multi-modal to image generation with one or more input images, the placeholder in the prompt should be in the format of <img><|image_*|></img> (for the first image, the placeholder is <|image_1|>, for the second image, the the placeholder is <|image_2|>). Refer to examples for more details
offload_model Type: booleanDefault: false: Offload model to CPU, which will significantly reduce the memory cost but slow down the generation speed. You can cancel separate_cfg_infer and set offload_model=True. If both separate_cfg_infer and offload_model are True, further reduce the memory, but slowest generation
guidance_scale Type: numberDefault: 2.5Range: 1 - 5: Classifier-free guidance scale for text prompt
inference_steps Type: integerDefault: 50Range: 1 - 100: Number of denoising steps
img_guidance_scale Type: numberDefault: 1.6Range: 1 - 2: Classifier-free guidance scale for images
separate_cfg_infer Type: booleanDefault: true: Whether to use separate inference process for different guidance. This will reduce the memory cost.
max_input_image_size Type: integerDefault: 1024Range: 128 - 2048: maximum input image size
use_input_image_size_as_output Type: booleanDefault: false: Automatically adjust the output image size to be same as input image size. For editing and controlnet task, it can make sure the output image has the same size as input image leading to better performance

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 5102
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:05<04:05,  5.01s/it]
  4%|▍         | 2/50 [00:07<02:47,  3.49s/it]
  6%|▌         | 3/50 [00:09<02:19,  2.96s/it]
  8%|▊         | 4/50 [00:12<02:04,  2.71s/it]
 10%|█         | 5/50 [00:14<01:55,  2.57s/it]
 12%|█▏        | 6/50 [00:16<01:49,  2.48s/it]
 14%|█▍        | 7/50 [00:19<01:44,  2.43s/it]
 16%|█▌        | 8/50 [00:21<01:40,  2.40s/it]
 18%|█▊        | 9/50 [00:23<01:37,  2.38s/it]
 20%|██        | 10/50 [00:26<01:34,  2.36s/it]
 22%|██▏       | 11/50 [00:28<01:31,  2.36s/it]
 24%|██▍       | 12/50 [00:30<01:29,  2.36s/it]
 26%|██▌       | 13/50 [00:33<01:27,  2.37s/it]
 28%|██▊       | 14/50 [00:35<01:25,  2.37s/it]
 30%|███       | 15/50 [00:37<01:22,  2.36s/it]
 32%|███▏      | 16/50 [00:40<01:20,  2.36s/it]
 34%|███▍      | 17/50 [00:42<01:17,  2.35s/it]
 36%|███▌      | 18/50 [00:44<01:15,  2.35s/it]
 38%|███▊      | 19/50 [00:47<01:12,  2.34s/it]
 40%|████      | 20/50 [00:49<01:09,  2.33s/it]
 42%|████▏     | 21/50 [00:51<01:07,  2.33s/it]
 44%|████▍     | 22/50 [00:54<01:05,  2.32s/it]
 46%|████▌     | 23/50 [00:56<01:02,  2.32s/it]
 48%|████▊     | 24/50 [00:58<01:00,  2.32s/it]
 50%|█████     | 25/50 [01:01<00:57,  2.32s/it]
 52%|█████▏    | 26/50 [01:03<00:55,  2.32s/it]
 54%|█████▍    | 27/50 [01:05<00:53,  2.32s/it]
 56%|█████▌    | 28/50 [01:08<00:50,  2.32s/it]
 58%|█████▊    | 29/50 [01:10<00:48,  2.32s/it]
 60%|██████    | 30/50 [01:12<00:46,  2.32s/it]
 62%|██████▏   | 31/50 [01:14<00:43,  2.32s/it]
 64%|██████▍   | 32/50 [01:17<00:41,  2.32s/it]
 66%|██████▌   | 33/50 [01:19<00:39,  2.32s/it]
 68%|██████▊   | 34/50 [01:21<00:37,  2.32s/it]
 70%|███████   | 35/50 [01:24<00:34,  2.32s/it]
 72%|███████▏  | 36/50 [01:26<00:32,  2.32s/it]
 74%|███████▍  | 37/50 [01:28<00:30,  2.32s/it]
 76%|███████▌  | 38/50 [01:31<00:27,  2.32s/it]
 78%|███████▊  | 39/50 [01:33<00:25,  2.33s/it]
 80%|████████  | 40/50 [01:35<00:23,  2.34s/it]
 82%|████████▏ | 41/50 [01:38<00:21,  2.34s/it]
 84%|████████▍ | 42/50 [01:40<00:18,  2.34s/it]
 86%|████████▌ | 43/50 [01:42<00:16,  2.34s/it]
 88%|████████▊ | 44/50 [01:45<00:14,  2.36s/it]
 90%|█████████ | 45/50 [01:47<00:11,  2.38s/it]
 92%|█████████▏| 46/50 [01:50<00:09,  2.40s/it]
 94%|█████████▍| 47/50 [01:52<00:07,  2.42s/it]
 96%|█████████▌| 48/50 [01:55<00:04,  2.44s/it]
 98%|█████████▊| 49/50 [01:57<00:02,  2.44s/it]
100%|██████████| 50/50 [01:59<00:00,  2.42s/it]
100%|██████████| 50/50 [01:59<00:00,  2.40s/it]

Version Details

Version ID: af66691a8952a0ce21b26e840835ad1efe176af159e10169ec5df6916338863b
Version Created: November 3, 2024

Run on Replicate →