vectorspacelab/omnigen 🖼️🔢📝✓ → 🖼️

▶️ 12.6K runs 📅 Oct 2024 ⚙️ Cog 0.9.23 🔗 GitHub 📄 Paper ⚖️ License
image-editing image-to-image text-to-image

About

OmniGen: Unified Image Generation

Example Output

Prompt:

"<|image_1|> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola."

Output

Example output

Performance Metrics

125.61s Prediction Time
266.90s Total Time
All Input Parameters
{
  "img1": "https://shitao-omnigen.hf.space/gradio_api/file=/tmp/gradio/23e02261d0d5dd416702fc5645b5192af904ddd5d83e6953913720c625b40306/t2i_woman_with_book.png",
  "width": 1024,
  "height": 1024,
  "prompt": "<img><|image_1|><img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola.",
  "offload_model": false,
  "guidance_scale": 2.5,
  "inference_steps": 50,
  "img_guidance_scale": 1.6,
  "separate_cfg_infer": true,
  "max_input_image_size": 1024,
  "use_input_image_size_as_output": true
}
Input Parameters
img1 Type: string
Input image 1. Optional
img2 Type: string
Input image 2. Optional
img3 Type: string
Input image 3. Optional
seed Type: integer
Random seed. Leave blank to randomize the seed
width Type: integerDefault: 1024Range: 128 - 2048
Width of the output image
height Type: integerDefault: 1024Range: 128 - 2048
Height of the output image
prompt Type: stringDefault: a photo of an astronaut riding a horse on mars
Input prompt. For multi-modal to image generation with one or more input images, the placeholder in the prompt should be in the format of <img><|image_*|></img> (for the first image, the placeholder is <|image_1|>, for the second image, the the placeholder is <|image_2|>). Refer to examples for more details
offload_model Type: booleanDefault: false
Offload model to CPU, which will significantly reduce the memory cost but slow down the generation speed. You can cancel separate_cfg_infer and set offload_model=True. If both separate_cfg_infer and offload_model are True, further reduce the memory, but slowest generation
guidance_scale Type: numberDefault: 2.5Range: 1 - 5
Classifier-free guidance scale for text prompt
inference_steps Type: integerDefault: 50Range: 1 - 100
Number of denoising steps
img_guidance_scale Type: numberDefault: 1.6Range: 1 - 2
Classifier-free guidance scale for images
separate_cfg_infer Type: booleanDefault: true
Whether to use separate inference process for different guidance. This will reduce the memory cost.
max_input_image_size Type: integerDefault: 1024Range: 128 - 2048
maximum input image size
use_input_image_size_as_output Type: booleanDefault: false
Automatically adjust the output image size to be same as input image size. For editing and controlnet task, it can make sure the output image has the same size as input image leading to better performance
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
Using seed: 5102
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:05<04:05,  5.01s/it]
  4%|▍         | 2/50 [00:07<02:47,  3.49s/it]
  6%|▌         | 3/50 [00:09<02:19,  2.96s/it]
  8%|▊         | 4/50 [00:12<02:04,  2.71s/it]
 10%|█         | 5/50 [00:14<01:55,  2.57s/it]
 12%|█▏        | 6/50 [00:16<01:49,  2.48s/it]
 14%|█▍        | 7/50 [00:19<01:44,  2.43s/it]
 16%|█▌        | 8/50 [00:21<01:40,  2.40s/it]
 18%|█▊        | 9/50 [00:23<01:37,  2.38s/it]
 20%|██        | 10/50 [00:26<01:34,  2.36s/it]
 22%|██▏       | 11/50 [00:28<01:31,  2.36s/it]
 24%|██▍       | 12/50 [00:30<01:29,  2.36s/it]
 26%|██▌       | 13/50 [00:33<01:27,  2.37s/it]
 28%|██▊       | 14/50 [00:35<01:25,  2.37s/it]
 30%|███       | 15/50 [00:37<01:22,  2.36s/it]
 32%|███▏      | 16/50 [00:40<01:20,  2.36s/it]
 34%|███▍      | 17/50 [00:42<01:17,  2.35s/it]
 36%|███▌      | 18/50 [00:44<01:15,  2.35s/it]
 38%|███▊      | 19/50 [00:47<01:12,  2.34s/it]
 40%|████      | 20/50 [00:49<01:09,  2.33s/it]
 42%|████▏     | 21/50 [00:51<01:07,  2.33s/it]
 44%|████▍     | 22/50 [00:54<01:05,  2.32s/it]
 46%|████▌     | 23/50 [00:56<01:02,  2.32s/it]
 48%|████▊     | 24/50 [00:58<01:00,  2.32s/it]
 50%|█████     | 25/50 [01:01<00:57,  2.32s/it]
 52%|█████▏    | 26/50 [01:03<00:55,  2.32s/it]
 54%|█████▍    | 27/50 [01:05<00:53,  2.32s/it]
 56%|█████▌    | 28/50 [01:08<00:50,  2.32s/it]
 58%|█████▊    | 29/50 [01:10<00:48,  2.32s/it]
 60%|██████    | 30/50 [01:12<00:46,  2.32s/it]
 62%|██████▏   | 31/50 [01:14<00:43,  2.32s/it]
 64%|██████▍   | 32/50 [01:17<00:41,  2.32s/it]
 66%|██████▌   | 33/50 [01:19<00:39,  2.32s/it]
 68%|██████▊   | 34/50 [01:21<00:37,  2.32s/it]
 70%|███████   | 35/50 [01:24<00:34,  2.32s/it]
 72%|███████▏  | 36/50 [01:26<00:32,  2.32s/it]
 74%|███████▍  | 37/50 [01:28<00:30,  2.32s/it]
 76%|███████▌  | 38/50 [01:31<00:27,  2.32s/it]
 78%|███████▊  | 39/50 [01:33<00:25,  2.33s/it]
 80%|████████  | 40/50 [01:35<00:23,  2.34s/it]
 82%|████████▏ | 41/50 [01:38<00:21,  2.34s/it]
 84%|████████▍ | 42/50 [01:40<00:18,  2.34s/it]
 86%|████████▌ | 43/50 [01:42<00:16,  2.34s/it]
 88%|████████▊ | 44/50 [01:45<00:14,  2.36s/it]
 90%|█████████ | 45/50 [01:47<00:11,  2.38s/it]
 92%|█████████▏| 46/50 [01:50<00:09,  2.40s/it]
 94%|█████████▍| 47/50 [01:52<00:07,  2.42s/it]
 96%|█████████▌| 48/50 [01:55<00:04,  2.44s/it]
 98%|█████████▊| 49/50 [01:57<00:02,  2.44s/it]
100%|██████████| 50/50 [01:59<00:00,  2.42s/it]
100%|██████████| 50/50 [01:59<00:00,  2.40s/it]
Version Details
Version ID
af66691a8952a0ce21b26e840835ad1efe176af159e10169ec5df6916338863b
Version Created
November 3, 2024
Run on Replicate →