lucataco/emu3.5-image 🔢❓📝🖼️ → 🖼️

▶️ 168 runs 📅 Nov 2025 ⚙️ Cog 0.16.8 🔗 GitHub 📄 Paper ⚖️ License

image-editing image-to-image text-to-image

About

Native multimodal models are world learners

Example Output

Prompt:

"As shown in the second figure: The ripe strawberry rests on a green leaf in the garden. Replace the chocolate truffle in first image with ripe strawberry from 2nd image"

Output

Performance Metrics

448.97s Prediction Time

540.43s Total Time

All Input Parameters

{
  "seed": 42,
  "ratio": "match_input",
  "prompt": "As shown in the second figure: The ripe strawberry rests on a green leaf in the garden. Replace the chocolate truffle in first image with ripe strawberry from 2nd image",
  "task_type": "x2i",
  "output_format": "jpeg",
  "max_new_tokens": 4096,
  "reference_image": [
    "https://replicate.delivery/pbxt/O0pNEWx8RR0VfVGsnLbheKDchUpcjIoXZ1hhWj1MgGyDEef5/truffle.png",
    "https://replicate.delivery/pbxt/O0pNELlcZTc9QCuibPPNPjfB07GlJfbviGH0BEJc8fzntNFb/strawberry.png"
  ]
}

Input Parameters

seed Type: integerDefault: 42: Random seed for reproducibility.
ratio Default: match_input: Aspect ratio for generated image. Use 'match_input' to match the first reference image's aspect ratio.
prompt (required) Type: string: User prompt to condition the model.
task_type Default: t2i: Task template to apply for generation.
output_format Default: png: Preferred output image format.
max_new_tokens Type: integerDefault: 4096Range: 512 - 32768: Maximum number of tokens to autoregressively generate.
reference_image Type: array: Optional reference image(s) for image-conditioned tasks (required for x2i). Provide as a list of up to 3 images.

Output Schema

Output

Type: array • Items Type: string • Items Format: uri

Example Execution Logs

[INFO] Using aspect ratio 31:48 from first reference image
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
[INFO] Generating image#1...
[DEBUG] Generated 4141 tokens
[DEBUG] First 50 token IDs: [1723, 151852, 20, 16, 9, 23, 15, 151851, 261372, 262204, 194937, 209170, 277950, 167787, 194102, 193640, 262201, 252093, 236188, 204850, 212910, 275066, 246450, 153635, 191237, 250427, 179942, 186117, 279554, 232645, 189678, 196000, 246935, 256575, 271812, 248797, 257686, 205048, 200238, 174043, 201429, 226709, 272977, 270812, 205766, 277746, 166422, 278864, 245397, 257343]
[DEBUG] Contains BOI token (151852): True
[DEBUG] Contains IMG token (151851): True
[DEBUG] Contains EOI token (151853): True
[DEBUG] Decoded output length: 94567 characters
[DEBUG] Decoded output (first 500 chars): urs<|image start|>51*80<|image token|><|visual token 109518|><|visual token 110350|><|visual token 043083|><|visual token 057316|><|visual token 126096|><|visual token 015933|><|visual token 042248|><|visual token 041786|><|visual token 110347|><|visual token 100239|><|visual token 084334|><|visual token 052996|><|visual token 061056|><|visual token 123212|><|visual token 094596|><|visual token 001781|><|visual token 039383|><|visual token 098573|><|visual token 028088|><|visual token 034263|><|
[DEBUG] Contains '<|image start|>': True
[DEBUG] Contains '<|image end|>': True
AUTOTUNE convolution(1x256x816x1280, 256x256x3x3)
convolution 3.3194 ms 100.0%
triton_convolution_1117 5.8843 ms 56.4%
triton_convolution_1119 8.0297 ms 41.3%
triton_convolution_1118 8.7070 ms 38.1%
triton_convolution_1114 11.2812 ms 29.4%
triton_convolution_1115 13.4379 ms 24.7%
triton_convolution_1120 13.5721 ms 24.5%
triton_convolution_1116 84.1248 ms 3.9%
SingleProcess AUTOTUNE benchmarking takes 1.9759 seconds and 0.0006 seconds precompiling
AUTOTUNE convolution(1x512x408x640, 512x512x3x3)
convolution 3.3395 ms 100.0%
triton_convolution_965 6.7410 ms 49.5%
triton_convolution_966 9.9851 ms 33.4%
triton_convolution_962 11.1393 ms 30.0%
triton_convolution_967 12.0469 ms 27.7%
triton_convolution_968 13.9858 ms 23.9%
triton_convolution_963 14.5281 ms 23.0%
triton_convolution_964 83.3323 ms 4.0%
SingleProcess AUTOTUNE benchmarking takes 1.9851 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x256x408x640, 256x256x3x3)
convolution 0.8363 ms 100.0%
triton_convolution_1040 1.5082 ms 55.4%
triton_convolution_1042 2.0696 ms 40.4%
triton_convolution_1041 2.3422 ms 35.7%
triton_convolution_1037 2.7298 ms 30.6%
triton_convolution_1043 3.3773 ms 24.8%
triton_convolution_1038 3.4110 ms 24.5%
triton_convolution_1039 20.7023 ms 4.0%
SingleProcess AUTOTUNE benchmarking takes 1.2127 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x512x204x320, 512x512x3x3)
convolution 0.7955 ms 100.0%
triton_convolution_944 1.7602 ms 45.2%
triton_convolution_945 2.4703 ms 32.2%
triton_convolution_941 2.8132 ms 28.3%
triton_convolution_946 3.0855 ms 25.8%
triton_convolution_942 3.5882 ms 22.2%
triton_convolution_947 3.6057 ms 22.1%
triton_convolution_943 18.8344 ms 4.2%
SingleProcess AUTOTUNE benchmarking takes 1.2168 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x1024x102x160, 1024x1024x3x3)
convolution 0.8058 ms 100.0%
triton_convolution_792 1.4537 ms 55.4%
triton_convolution_793 2.0993 ms 38.4%
triton_convolution_790 2.4346 ms 33.1%
triton_convolution_794 2.8968 ms 27.8%
triton_convolution_789 3.0639 ms 26.3%
triton_convolution_795 3.1208 ms 25.8%
triton_convolution_791 16.1058 ms 5.0%
SingleProcess AUTOTUNE benchmarking takes 1.1972 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x512x102x160, 512x512x3x3)
convolution 0.2012 ms 100.0%
triton_convolution_867 0.3663 ms 54.9%
triton_convolution_868 0.4937 ms 40.8%
triton_convolution_865 0.6049 ms 33.3%
triton_convolution_864 0.6290 ms 32.0%
triton_convolution_869 0.7672 ms 26.2%
triton_convolution_870 0.8254 ms 24.4%
triton_convolution_866 3.4742 ms 5.8%
SingleProcess AUTOTUNE benchmarking takes 1.0384 seconds and 0.0006 seconds precompiling
W1106 02:36:48.332000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE addmm(4080x1024, 4080x1024, 1024x1024)
bias_addmm 0.0330 ms 100.0%
addmm 0.0501 ms 65.9%
triton_mm_127 0.0863 ms 38.3%
triton_mm_126 0.0983 ms 33.6%
triton_mm_125 0.1069 ms 30.9%
triton_mm_132 0.1182 ms 27.9%
triton_mm_118 0.1225 ms 27.0%
triton_mm_117 0.1280 ms 25.8%
triton_mm_134 0.1365 ms 24.2%
triton_mm_122 0.1385 ms 23.9%
SingleProcess AUTOTUNE benchmarking takes 2.4273 seconds and 6.8002 seconds precompiling
AUTOTUNE convolution(1x256x816x1280, 3x256x3x3)
convolution 1.4102 ms 100.0%
triton_convolution_1139 2.0691 ms 68.2%
triton_convolution_1138 2.0806 ms 67.8%
triton_convolution_1140 2.1098 ms 66.8%
triton_convolution_1135 3.0940 ms 45.6%
triton_convolution_1136 3.1575 ms 44.7%
triton_convolution_1137 4.9872 ms 28.3%
SingleProcess AUTOTUNE benchmarking takes 1.0882 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x256x51x80, 1024x256x3x3)
convolution 0.0566 ms 100.0%
triton_convolution_3 0.0993 ms 57.0%
triton_convolution_4 0.1517 ms 37.3%
triton_convolution_0 0.1532 ms 36.9%
triton_convolution_1 0.1737 ms 32.6%
triton_convolution_5 0.1915 ms 29.5%
triton_convolution_6 0.2108 ms 26.8%
triton_convolution_2 0.7010 ms 8.1%
SingleProcess AUTOTUNE benchmarking takes 0.9747 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x1024x51x80, 1024x1024x3x3)
convolution 0.1895 ms 100.0%
triton_convolution_10 0.3557 ms 53.3%
triton_convolution_11 0.5630 ms 33.7%
triton_convolution_7 0.5823 ms 32.5%
triton_convolution_8 0.6072 ms 31.2%
triton_convolution_12 0.7002 ms 27.1%
triton_convolution_13 0.7335 ms 25.8%
triton_convolution_9 2.6669 ms 7.1%
SingleProcess AUTOTUNE benchmarking takes 1.0341 seconds and 0.0006 seconds precompiling
W1106 02:36:55.550000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE addmm(4080x1024, 4080x1024, 1024x1024)
bias_addmm 0.0329 ms 100.0%
addmm 0.0498 ms 66.0%
triton_mm_32 0.0840 ms 39.2%
triton_mm_31 0.0966 ms 34.1%
triton_mm_30 0.1049 ms 31.4%
triton_mm_37 0.1176 ms 28.0%
triton_mm_23 0.1206 ms 27.3%
triton_mm_22 0.1229 ms 26.8%
triton_mm_39 0.1345 ms 24.5%
triton_mm_27 0.1354 ms 24.3%
SingleProcess AUTOTUNE benchmarking takes 2.4249 seconds and 0.0013 seconds precompiling
W1106 02:36:56.912000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 245760, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
W1106 02:36:57.282000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 327680, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
W1106 02:36:57.649000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 393216, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
W1106 02:36:58.259000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 327680, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE bmm(1x4080x1024, 1x1024x4080)
bmm 0.0973 ms 100.0%
triton_bmm_94 0.1157 ms 84.1%
triton_bmm_95 0.1410 ms 69.0%
triton_bmm_87 0.1486 ms 65.5%
triton_bmm_93 0.1516 ms 64.2%
triton_bmm_91 0.1532 ms 63.5%
triton_bmm_88 0.1555 ms 62.6%
triton_bmm_92 0.1567 ms 62.1%
triton_bmm_89 0.1582 ms 61.5%
triton_bmm_84 0.1907 ms 51.0%
SingleProcess AUTOTUNE benchmarking takes 1.9770 seconds and 0.0011 seconds precompiling
W1106 02:36:59.907000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE bmm(1x1024x4080, 1x4080x4080)
bmm 0.0944 ms 100.0%
triton_bmm_106 0.3817 ms 24.7%
triton_bmm_107 0.4097 ms 23.0%
triton_bmm_108 0.4279 ms 22.1%
triton_bmm_113 0.4355 ms 21.7%
triton_bmm_99 0.4511 ms 20.9%
triton_bmm_112 0.4615 ms 20.4%
triton_bmm_98 0.4651 ms 20.3%
triton_bmm_115 0.5165 ms 18.3%
triton_bmm_103 0.5221 ms 18.1%
SingleProcess AUTOTUNE benchmarking takes 2.4083 seconds and 0.0013 seconds precompiling
AUTOTUNE convolution(1x1024x102x160, 512x1024x3x3)
convolution 0.3929 ms 100.0%
triton_convolution_799 0.7506 ms 52.3%
triton_convolution_800 1.0352 ms 38.0%
triton_convolution_797 1.2278 ms 32.0%
triton_convolution_801 1.4581 ms 26.9%
triton_convolution_796 1.5366 ms 25.6%
triton_convolution_802 1.5757 ms 24.9%
triton_convolution_798 8.0975 ms 4.9%
SingleProcess AUTOTUNE benchmarking takes 1.1064 seconds and 0.0004 seconds precompiling
W1106 02:37:03.525000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE addmm(16320x512, 16320x1024, 1024x512)
bias_addmm 0.0651 ms 100.0%
addmm 0.0942 ms 69.0%
triton_mm_814 0.1606 ms 40.5%
triton_mm_812 0.1804 ms 36.1%
triton_mm_813 0.1867 ms 34.8%
triton_mm_805 0.2288 ms 28.4%
triton_mm_819 0.2321 ms 28.0%
triton_mm_804 0.2369 ms 27.5%
triton_mm_821 0.2631 ms 24.7%
triton_mm_809 0.2638 ms 24.7%
SingleProcess AUTOTUNE benchmarking takes 2.4799 seconds and 0.0012 seconds precompiling
AUTOTUNE convolution(1x512x408x640, 256x512x3x3)
convolution 1.6813 ms 100.0%
triton_convolution_972 3.4246 ms 49.1%
triton_convolution_973 4.9632 ms 33.9%
triton_convolution_969 5.7886 ms 29.0%
triton_convolution_974 6.1524 ms 27.3%
triton_convolution_975 6.9936 ms 24.0%
triton_convolution_970 7.3198 ms 23.0%
triton_convolution_971 41.5959 ms 4.0%
SingleProcess AUTOTUNE benchmarking takes 1.4830 seconds and 0.0005 seconds precompiling
W1106 02:37:07.597000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE addmm(261120x256, 261120x512, 512x256)
bias_addmm 0.2716 ms 100.0%
addmm 0.4977 ms 54.6%
triton_mm_985 0.6522 ms 41.6%
triton_mm_987 0.6760 ms 40.2%
triton_mm_986 0.7659 ms 35.5%
triton_mm_992 0.9313 ms 29.2%
triton_mm_978 0.9738 ms 27.9%
triton_mm_977 1.0178 ms 26.7%
triton_mm_982 1.0257 ms 26.5%
triton_mm_994 1.0479 ms 25.9%
SingleProcess AUTOTUNE benchmarking takes 2.6200 seconds and 0.0012 seconds precompiling
[DEBUG] Extracted 1 images and 1 text segments
[DEBUG] Multimodal output type: <class 'list'>
[DEBUG] Multimodal output length: 2
Generated textual outputs:
[1] urs

Version Details

Version ID: f73a1797417c699d3332ec5653755d0ea68c38fc40bdda302e6e7b187f570d6c
Version Created: November 6, 2025

Run on Replicate →