lucataco/emu3.5-image 🔢❓📝🖼️ → 🖼️

▶️ 168 runs 📅 Nov 2025 ⚙️ Cog 0.16.8 🔗 GitHub 📄 Paper ⚖️ License
image-editing image-to-image text-to-image

About

Native multimodal models are world learners

Example Output

Prompt:

"As shown in the second figure: The ripe strawberry rests on a green leaf in the garden. Replace the chocolate truffle in first image with ripe strawberry from 2nd image"

Output

Example output

Performance Metrics

448.97s Prediction Time
540.43s Total Time
All Input Parameters
{
  "seed": 42,
  "ratio": "match_input",
  "prompt": "As shown in the second figure: The ripe strawberry rests on a green leaf in the garden. Replace the chocolate truffle in first image with ripe strawberry from 2nd image",
  "task_type": "x2i",
  "output_format": "jpeg",
  "max_new_tokens": 4096,
  "reference_image": [
    "https://replicate.delivery/pbxt/O0pNEWx8RR0VfVGsnLbheKDchUpcjIoXZ1hhWj1MgGyDEef5/truffle.png",
    "https://replicate.delivery/pbxt/O0pNELlcZTc9QCuibPPNPjfB07GlJfbviGH0BEJc8fzntNFb/strawberry.png"
  ]
}
Input Parameters
seed Type: integerDefault: 42
Random seed for reproducibility.
ratio Default: match_input
Aspect ratio for generated image. Use 'match_input' to match the first reference image's aspect ratio.
prompt (required) Type: string
User prompt to condition the model.
task_type Default: t2i
Task template to apply for generation.
output_format Default: png
Preferred output image format.
max_new_tokens Type: integerDefault: 4096Range: 512 - 32768
Maximum number of tokens to autoregressively generate.
reference_image Type: array
Optional reference image(s) for image-conditioned tasks (required for x2i). Provide as a list of up to 3 images.
Output Schema

Output

Type: arrayItems Type: stringItems Format: uri

Example Execution Logs
[INFO] Using aspect ratio 31:48 from first reference image
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
[INFO] Generating image#1...
[DEBUG] Generated 4141 tokens
[DEBUG] First 50 token IDs: [1723, 151852, 20, 16, 9, 23, 15, 151851, 261372, 262204, 194937, 209170, 277950, 167787, 194102, 193640, 262201, 252093, 236188, 204850, 212910, 275066, 246450, 153635, 191237, 250427, 179942, 186117, 279554, 232645, 189678, 196000, 246935, 256575, 271812, 248797, 257686, 205048, 200238, 174043, 201429, 226709, 272977, 270812, 205766, 277746, 166422, 278864, 245397, 257343]
[DEBUG] Contains BOI token (151852): True
[DEBUG] Contains IMG token (151851): True
[DEBUG] Contains EOI token (151853): True
[DEBUG] Decoded output length: 94567 characters
[DEBUG] Decoded output (first 500 chars): urs<|image start|>51*80<|image token|><|visual token 109518|><|visual token 110350|><|visual token 043083|><|visual token 057316|><|visual token 126096|><|visual token 015933|><|visual token 042248|><|visual token 041786|><|visual token 110347|><|visual token 100239|><|visual token 084334|><|visual token 052996|><|visual token 061056|><|visual token 123212|><|visual token 094596|><|visual token 001781|><|visual token 039383|><|visual token 098573|><|visual token 028088|><|visual token 034263|><|
[DEBUG] Contains '<|image start|>': True
[DEBUG] Contains '<|image end|>': True
AUTOTUNE convolution(1x256x816x1280, 256x256x3x3)
convolution 3.3194 ms 100.0%
triton_convolution_1117 5.8843 ms 56.4%
triton_convolution_1119 8.0297 ms 41.3%
triton_convolution_1118 8.7070 ms 38.1%
triton_convolution_1114 11.2812 ms 29.4%
triton_convolution_1115 13.4379 ms 24.7%
triton_convolution_1120 13.5721 ms 24.5%
triton_convolution_1116 84.1248 ms 3.9%
SingleProcess AUTOTUNE benchmarking takes 1.9759 seconds and 0.0006 seconds precompiling
AUTOTUNE convolution(1x512x408x640, 512x512x3x3)
convolution 3.3395 ms 100.0%
triton_convolution_965 6.7410 ms 49.5%
triton_convolution_966 9.9851 ms 33.4%
triton_convolution_962 11.1393 ms 30.0%
triton_convolution_967 12.0469 ms 27.7%
triton_convolution_968 13.9858 ms 23.9%
triton_convolution_963 14.5281 ms 23.0%
triton_convolution_964 83.3323 ms 4.0%
SingleProcess AUTOTUNE benchmarking takes 1.9851 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x256x408x640, 256x256x3x3)
convolution 0.8363 ms 100.0%
triton_convolution_1040 1.5082 ms 55.4%
triton_convolution_1042 2.0696 ms 40.4%
triton_convolution_1041 2.3422 ms 35.7%
triton_convolution_1037 2.7298 ms 30.6%
triton_convolution_1043 3.3773 ms 24.8%
triton_convolution_1038 3.4110 ms 24.5%
triton_convolution_1039 20.7023 ms 4.0%
SingleProcess AUTOTUNE benchmarking takes 1.2127 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x512x204x320, 512x512x3x3)
convolution 0.7955 ms 100.0%
triton_convolution_944 1.7602 ms 45.2%
triton_convolution_945 2.4703 ms 32.2%
triton_convolution_941 2.8132 ms 28.3%
triton_convolution_946 3.0855 ms 25.8%
triton_convolution_942 3.5882 ms 22.2%
triton_convolution_947 3.6057 ms 22.1%
triton_convolution_943 18.8344 ms 4.2%
SingleProcess AUTOTUNE benchmarking takes 1.2168 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x1024x102x160, 1024x1024x3x3)
convolution 0.8058 ms 100.0%
triton_convolution_792 1.4537 ms 55.4%
triton_convolution_793 2.0993 ms 38.4%
triton_convolution_790 2.4346 ms 33.1%
triton_convolution_794 2.8968 ms 27.8%
triton_convolution_789 3.0639 ms 26.3%
triton_convolution_795 3.1208 ms 25.8%
triton_convolution_791 16.1058 ms 5.0%
SingleProcess AUTOTUNE benchmarking takes 1.1972 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x512x102x160, 512x512x3x3)
convolution 0.2012 ms 100.0%
triton_convolution_867 0.3663 ms 54.9%
triton_convolution_868 0.4937 ms 40.8%
triton_convolution_865 0.6049 ms 33.3%
triton_convolution_864 0.6290 ms 32.0%
triton_convolution_869 0.7672 ms 26.2%
triton_convolution_870 0.8254 ms 24.4%
triton_convolution_866 3.4742 ms 5.8%
SingleProcess AUTOTUNE benchmarking takes 1.0384 seconds and 0.0006 seconds precompiling
W1106 02:36:48.332000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE addmm(4080x1024, 4080x1024, 1024x1024)
bias_addmm 0.0330 ms 100.0%
addmm 0.0501 ms 65.9%
triton_mm_127 0.0863 ms 38.3%
triton_mm_126 0.0983 ms 33.6%
triton_mm_125 0.1069 ms 30.9%
triton_mm_132 0.1182 ms 27.9%
triton_mm_118 0.1225 ms 27.0%
triton_mm_117 0.1280 ms 25.8%
triton_mm_134 0.1365 ms 24.2%
triton_mm_122 0.1385 ms 23.9%
SingleProcess AUTOTUNE benchmarking takes 2.4273 seconds and 6.8002 seconds precompiling
AUTOTUNE convolution(1x256x816x1280, 3x256x3x3)
convolution 1.4102 ms 100.0%
triton_convolution_1139 2.0691 ms 68.2%
triton_convolution_1138 2.0806 ms 67.8%
triton_convolution_1140 2.1098 ms 66.8%
triton_convolution_1135 3.0940 ms 45.6%
triton_convolution_1136 3.1575 ms 44.7%
triton_convolution_1137 4.9872 ms 28.3%
SingleProcess AUTOTUNE benchmarking takes 1.0882 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x256x51x80, 1024x256x3x3)
convolution 0.0566 ms 100.0%
triton_convolution_3 0.0993 ms 57.0%
triton_convolution_4 0.1517 ms 37.3%
triton_convolution_0 0.1532 ms 36.9%
triton_convolution_1 0.1737 ms 32.6%
triton_convolution_5 0.1915 ms 29.5%
triton_convolution_6 0.2108 ms 26.8%
triton_convolution_2 0.7010 ms 8.1%
SingleProcess AUTOTUNE benchmarking takes 0.9747 seconds and 0.0005 seconds precompiling
AUTOTUNE convolution(1x1024x51x80, 1024x1024x3x3)
convolution 0.1895 ms 100.0%
triton_convolution_10 0.3557 ms 53.3%
triton_convolution_11 0.5630 ms 33.7%
triton_convolution_7 0.5823 ms 32.5%
triton_convolution_8 0.6072 ms 31.2%
triton_convolution_12 0.7002 ms 27.1%
triton_convolution_13 0.7335 ms 25.8%
triton_convolution_9 2.6669 ms 7.1%
SingleProcess AUTOTUNE benchmarking takes 1.0341 seconds and 0.0006 seconds precompiling
W1106 02:36:55.550000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE addmm(4080x1024, 4080x1024, 1024x1024)
bias_addmm 0.0329 ms 100.0%
addmm 0.0498 ms 66.0%
triton_mm_32 0.0840 ms 39.2%
triton_mm_31 0.0966 ms 34.1%
triton_mm_30 0.1049 ms 31.4%
triton_mm_37 0.1176 ms 28.0%
triton_mm_23 0.1206 ms 27.3%
triton_mm_22 0.1229 ms 26.8%
triton_mm_39 0.1345 ms 24.5%
triton_mm_27 0.1354 ms 24.3%
SingleProcess AUTOTUNE benchmarking takes 2.4249 seconds and 0.0013 seconds precompiling
W1106 02:36:56.912000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 245760, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
W1106 02:36:57.282000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 327680, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
W1106 02:36:57.649000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 393216, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
W1106 02:36:58.259000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 327680, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE bmm(1x4080x1024, 1x1024x4080)
bmm 0.0973 ms 100.0%
triton_bmm_94 0.1157 ms 84.1%
triton_bmm_95 0.1410 ms 69.0%
triton_bmm_87 0.1486 ms 65.5%
triton_bmm_93 0.1516 ms 64.2%
triton_bmm_91 0.1532 ms 63.5%
triton_bmm_88 0.1555 ms 62.6%
triton_bmm_92 0.1567 ms 62.1%
triton_bmm_89 0.1582 ms 61.5%
triton_bmm_84 0.1907 ms 51.0%
SingleProcess AUTOTUNE benchmarking takes 1.9770 seconds and 0.0011 seconds precompiling
W1106 02:36:59.907000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE bmm(1x1024x4080, 1x4080x4080)
bmm 0.0944 ms 100.0%
triton_bmm_106 0.3817 ms 24.7%
triton_bmm_107 0.4097 ms 23.0%
triton_bmm_108 0.4279 ms 22.1%
triton_bmm_113 0.4355 ms 21.7%
triton_bmm_99 0.4511 ms 20.9%
triton_bmm_112 0.4615 ms 20.4%
triton_bmm_98 0.4651 ms 20.3%
triton_bmm_115 0.5165 ms 18.3%
triton_bmm_103 0.5221 ms 18.1%
SingleProcess AUTOTUNE benchmarking takes 2.4083 seconds and 0.0013 seconds precompiling
AUTOTUNE convolution(1x1024x102x160, 512x1024x3x3)
convolution 0.3929 ms 100.0%
triton_convolution_799 0.7506 ms 52.3%
triton_convolution_800 1.0352 ms 38.0%
triton_convolution_797 1.2278 ms 32.0%
triton_convolution_801 1.4581 ms 26.9%
triton_convolution_796 1.5366 ms 25.6%
triton_convolution_802 1.5757 ms 24.9%
triton_convolution_798 8.0975 ms 4.9%
SingleProcess AUTOTUNE benchmarking takes 1.1064 seconds and 0.0004 seconds precompiling
W1106 02:37:03.525000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE addmm(16320x512, 16320x1024, 1024x512)
bias_addmm 0.0651 ms 100.0%
addmm 0.0942 ms 69.0%
triton_mm_814 0.1606 ms 40.5%
triton_mm_812 0.1804 ms 36.1%
triton_mm_813 0.1867 ms 34.8%
triton_mm_805 0.2288 ms 28.4%
triton_mm_819 0.2321 ms 28.0%
triton_mm_804 0.2369 ms 27.5%
triton_mm_821 0.2631 ms 24.7%
triton_mm_809 0.2638 ms 24.7%
SingleProcess AUTOTUNE benchmarking takes 2.4799 seconds and 0.0012 seconds precompiling
AUTOTUNE convolution(1x512x408x640, 256x512x3x3)
convolution 1.6813 ms 100.0%
triton_convolution_972 3.4246 ms 49.1%
triton_convolution_973 4.9632 ms 33.9%
triton_convolution_969 5.7886 ms 29.0%
triton_convolution_974 6.1524 ms 27.3%
triton_convolution_975 6.9936 ms 24.0%
triton_convolution_970 7.3198 ms 23.0%
triton_convolution_971 41.5959 ms 4.0%
SingleProcess AUTOTUNE benchmarking takes 1.4830 seconds and 0.0005 seconds precompiling
W1106 02:37:07.597000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help.
AUTOTUNE addmm(261120x256, 261120x512, 512x256)
bias_addmm 0.2716 ms 100.0%
addmm 0.4977 ms 54.6%
triton_mm_985 0.6522 ms 41.6%
triton_mm_987 0.6760 ms 40.2%
triton_mm_986 0.7659 ms 35.5%
triton_mm_992 0.9313 ms 29.2%
triton_mm_978 0.9738 ms 27.9%
triton_mm_977 1.0178 ms 26.7%
triton_mm_982 1.0257 ms 26.5%
triton_mm_994 1.0479 ms 25.9%
SingleProcess AUTOTUNE benchmarking takes 2.6200 seconds and 0.0012 seconds precompiling
[DEBUG] Extracted 1 images and 1 text segments
[DEBUG] Multimodal output type: <class 'list'>
[DEBUG] Multimodal output length: 2
Generated textual outputs:
[1] urs
Version Details
Version ID
f73a1797417c699d3332ec5653755d0ea68c38fc40bdda302e6e7b187f570d6c
Version Created
November 6, 2025
Run on Replicate →