lucataco/emu3.5-image 🔢❓📝🖼️ → 🖼️
About
Native multimodal models are world learners
Example Output
Prompt:
"As shown in the second figure: The ripe strawberry rests on a green leaf in the garden. Replace the chocolate truffle in first image with ripe strawberry from 2nd image"
Output
Performance Metrics
448.97s
Prediction Time
540.43s
Total Time
All Input Parameters
{
"seed": 42,
"ratio": "match_input",
"prompt": "As shown in the second figure: The ripe strawberry rests on a green leaf in the garden. Replace the chocolate truffle in first image with ripe strawberry from 2nd image",
"task_type": "x2i",
"output_format": "jpeg",
"max_new_tokens": 4096,
"reference_image": [
"https://replicate.delivery/pbxt/O0pNEWx8RR0VfVGsnLbheKDchUpcjIoXZ1hhWj1MgGyDEef5/truffle.png",
"https://replicate.delivery/pbxt/O0pNELlcZTc9QCuibPPNPjfB07GlJfbviGH0BEJc8fzntNFb/strawberry.png"
]
}
Input Parameters
- seed
- Random seed for reproducibility.
- ratio
- Aspect ratio for generated image. Use 'match_input' to match the first reference image's aspect ratio.
- prompt (required)
- User prompt to condition the model.
- task_type
- Task template to apply for generation.
- output_format
- Preferred output image format.
- max_new_tokens
- Maximum number of tokens to autoregressively generate.
- reference_image
- Optional reference image(s) for image-conditioned tasks (required for x2i). Provide as a list of up to 3 images.
Output Schema
Output
Example Execution Logs
[INFO] Using aspect ratio 31:48 from first reference image The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead. `get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48 [INFO] Generating image#1... [DEBUG] Generated 4141 tokens [DEBUG] First 50 token IDs: [1723, 151852, 20, 16, 9, 23, 15, 151851, 261372, 262204, 194937, 209170, 277950, 167787, 194102, 193640, 262201, 252093, 236188, 204850, 212910, 275066, 246450, 153635, 191237, 250427, 179942, 186117, 279554, 232645, 189678, 196000, 246935, 256575, 271812, 248797, 257686, 205048, 200238, 174043, 201429, 226709, 272977, 270812, 205766, 277746, 166422, 278864, 245397, 257343] [DEBUG] Contains BOI token (151852): True [DEBUG] Contains IMG token (151851): True [DEBUG] Contains EOI token (151853): True [DEBUG] Decoded output length: 94567 characters [DEBUG] Decoded output (first 500 chars): urs<|image start|>51*80<|image token|><|visual token 109518|><|visual token 110350|><|visual token 043083|><|visual token 057316|><|visual token 126096|><|visual token 015933|><|visual token 042248|><|visual token 041786|><|visual token 110347|><|visual token 100239|><|visual token 084334|><|visual token 052996|><|visual token 061056|><|visual token 123212|><|visual token 094596|><|visual token 001781|><|visual token 039383|><|visual token 098573|><|visual token 028088|><|visual token 034263|><| [DEBUG] Contains '<|image start|>': True [DEBUG] Contains '<|image end|>': True AUTOTUNE convolution(1x256x816x1280, 256x256x3x3) convolution 3.3194 ms 100.0% triton_convolution_1117 5.8843 ms 56.4% triton_convolution_1119 8.0297 ms 41.3% triton_convolution_1118 8.7070 ms 38.1% triton_convolution_1114 11.2812 ms 29.4% triton_convolution_1115 13.4379 ms 24.7% triton_convolution_1120 13.5721 ms 24.5% triton_convolution_1116 84.1248 ms 3.9% SingleProcess AUTOTUNE benchmarking takes 1.9759 seconds and 0.0006 seconds precompiling AUTOTUNE convolution(1x512x408x640, 512x512x3x3) convolution 3.3395 ms 100.0% triton_convolution_965 6.7410 ms 49.5% triton_convolution_966 9.9851 ms 33.4% triton_convolution_962 11.1393 ms 30.0% triton_convolution_967 12.0469 ms 27.7% triton_convolution_968 13.9858 ms 23.9% triton_convolution_963 14.5281 ms 23.0% triton_convolution_964 83.3323 ms 4.0% SingleProcess AUTOTUNE benchmarking takes 1.9851 seconds and 0.0005 seconds precompiling AUTOTUNE convolution(1x256x408x640, 256x256x3x3) convolution 0.8363 ms 100.0% triton_convolution_1040 1.5082 ms 55.4% triton_convolution_1042 2.0696 ms 40.4% triton_convolution_1041 2.3422 ms 35.7% triton_convolution_1037 2.7298 ms 30.6% triton_convolution_1043 3.3773 ms 24.8% triton_convolution_1038 3.4110 ms 24.5% triton_convolution_1039 20.7023 ms 4.0% SingleProcess AUTOTUNE benchmarking takes 1.2127 seconds and 0.0005 seconds precompiling AUTOTUNE convolution(1x512x204x320, 512x512x3x3) convolution 0.7955 ms 100.0% triton_convolution_944 1.7602 ms 45.2% triton_convolution_945 2.4703 ms 32.2% triton_convolution_941 2.8132 ms 28.3% triton_convolution_946 3.0855 ms 25.8% triton_convolution_942 3.5882 ms 22.2% triton_convolution_947 3.6057 ms 22.1% triton_convolution_943 18.8344 ms 4.2% SingleProcess AUTOTUNE benchmarking takes 1.2168 seconds and 0.0005 seconds precompiling AUTOTUNE convolution(1x1024x102x160, 1024x1024x3x3) convolution 0.8058 ms 100.0% triton_convolution_792 1.4537 ms 55.4% triton_convolution_793 2.0993 ms 38.4% triton_convolution_790 2.4346 ms 33.1% triton_convolution_794 2.8968 ms 27.8% triton_convolution_789 3.0639 ms 26.3% triton_convolution_795 3.1208 ms 25.8% triton_convolution_791 16.1058 ms 5.0% SingleProcess AUTOTUNE benchmarking takes 1.1972 seconds and 0.0005 seconds precompiling AUTOTUNE convolution(1x512x102x160, 512x512x3x3) convolution 0.2012 ms 100.0% triton_convolution_867 0.3663 ms 54.9% triton_convolution_868 0.4937 ms 40.8% triton_convolution_865 0.6049 ms 33.3% triton_convolution_864 0.6290 ms 32.0% triton_convolution_869 0.7672 ms 26.2% triton_convolution_870 0.8254 ms 24.4% triton_convolution_866 3.4742 ms 5.8% SingleProcess AUTOTUNE benchmarking takes 1.0384 seconds and 0.0006 seconds precompiling W1106 02:36:48.332000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. AUTOTUNE addmm(4080x1024, 4080x1024, 1024x1024) bias_addmm 0.0330 ms 100.0% addmm 0.0501 ms 65.9% triton_mm_127 0.0863 ms 38.3% triton_mm_126 0.0983 ms 33.6% triton_mm_125 0.1069 ms 30.9% triton_mm_132 0.1182 ms 27.9% triton_mm_118 0.1225 ms 27.0% triton_mm_117 0.1280 ms 25.8% triton_mm_134 0.1365 ms 24.2% triton_mm_122 0.1385 ms 23.9% SingleProcess AUTOTUNE benchmarking takes 2.4273 seconds and 6.8002 seconds precompiling AUTOTUNE convolution(1x256x816x1280, 3x256x3x3) convolution 1.4102 ms 100.0% triton_convolution_1139 2.0691 ms 68.2% triton_convolution_1138 2.0806 ms 67.8% triton_convolution_1140 2.1098 ms 66.8% triton_convolution_1135 3.0940 ms 45.6% triton_convolution_1136 3.1575 ms 44.7% triton_convolution_1137 4.9872 ms 28.3% SingleProcess AUTOTUNE benchmarking takes 1.0882 seconds and 0.0005 seconds precompiling AUTOTUNE convolution(1x256x51x80, 1024x256x3x3) convolution 0.0566 ms 100.0% triton_convolution_3 0.0993 ms 57.0% triton_convolution_4 0.1517 ms 37.3% triton_convolution_0 0.1532 ms 36.9% triton_convolution_1 0.1737 ms 32.6% triton_convolution_5 0.1915 ms 29.5% triton_convolution_6 0.2108 ms 26.8% triton_convolution_2 0.7010 ms 8.1% SingleProcess AUTOTUNE benchmarking takes 0.9747 seconds and 0.0005 seconds precompiling AUTOTUNE convolution(1x1024x51x80, 1024x1024x3x3) convolution 0.1895 ms 100.0% triton_convolution_10 0.3557 ms 53.3% triton_convolution_11 0.5630 ms 33.7% triton_convolution_7 0.5823 ms 32.5% triton_convolution_8 0.6072 ms 31.2% triton_convolution_12 0.7002 ms 27.1% triton_convolution_13 0.7335 ms 25.8% triton_convolution_9 2.6669 ms 7.1% SingleProcess AUTOTUNE benchmarking takes 1.0341 seconds and 0.0006 seconds precompiling W1106 02:36:55.550000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. AUTOTUNE addmm(4080x1024, 4080x1024, 1024x1024) bias_addmm 0.0329 ms 100.0% addmm 0.0498 ms 66.0% triton_mm_32 0.0840 ms 39.2% triton_mm_31 0.0966 ms 34.1% triton_mm_30 0.1049 ms 31.4% triton_mm_37 0.1176 ms 28.0% triton_mm_23 0.1206 ms 27.3% triton_mm_22 0.1229 ms 26.8% triton_mm_39 0.1345 ms 24.5% triton_mm_27 0.1354 ms 24.3% SingleProcess AUTOTUNE benchmarking takes 2.4249 seconds and 0.0013 seconds precompiling W1106 02:36:56.912000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 245760, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. W1106 02:36:57.282000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 327680, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. W1106 02:36:57.649000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 393216, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. W1106 02:36:58.259000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 327680, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. AUTOTUNE bmm(1x4080x1024, 1x1024x4080) bmm 0.0973 ms 100.0% triton_bmm_94 0.1157 ms 84.1% triton_bmm_95 0.1410 ms 69.0% triton_bmm_87 0.1486 ms 65.5% triton_bmm_93 0.1516 ms 64.2% triton_bmm_91 0.1532 ms 63.5% triton_bmm_88 0.1555 ms 62.6% triton_bmm_92 0.1567 ms 62.1% triton_bmm_89 0.1582 ms 61.5% triton_bmm_84 0.1907 ms 51.0% SingleProcess AUTOTUNE benchmarking takes 1.9770 seconds and 0.0011 seconds precompiling W1106 02:36:59.907000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. AUTOTUNE bmm(1x1024x4080, 1x4080x4080) bmm 0.0944 ms 100.0% triton_bmm_106 0.3817 ms 24.7% triton_bmm_107 0.4097 ms 23.0% triton_bmm_108 0.4279 ms 22.1% triton_bmm_113 0.4355 ms 21.7% triton_bmm_99 0.4511 ms 20.9% triton_bmm_112 0.4615 ms 20.4% triton_bmm_98 0.4651 ms 20.3% triton_bmm_115 0.5165 ms 18.3% triton_bmm_103 0.5221 ms 18.1% SingleProcess AUTOTUNE benchmarking takes 2.4083 seconds and 0.0013 seconds precompiling AUTOTUNE convolution(1x1024x102x160, 512x1024x3x3) convolution 0.3929 ms 100.0% triton_convolution_799 0.7506 ms 52.3% triton_convolution_800 1.0352 ms 38.0% triton_convolution_797 1.2278 ms 32.0% triton_convolution_801 1.4581 ms 26.9% triton_convolution_796 1.5366 ms 25.6% triton_convolution_802 1.5757 ms 24.9% triton_convolution_798 8.0975 ms 4.9% SingleProcess AUTOTUNE benchmarking takes 1.1064 seconds and 0.0004 seconds precompiling W1106 02:37:03.525000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. AUTOTUNE addmm(16320x512, 16320x1024, 1024x512) bias_addmm 0.0651 ms 100.0% addmm 0.0942 ms 69.0% triton_mm_814 0.1606 ms 40.5% triton_mm_812 0.1804 ms 36.1% triton_mm_813 0.1867 ms 34.8% triton_mm_805 0.2288 ms 28.4% triton_mm_819 0.2321 ms 28.0% triton_mm_804 0.2369 ms 27.5% triton_mm_821 0.2631 ms 24.7% triton_mm_809 0.2638 ms 24.7% SingleProcess AUTOTUNE benchmarking takes 2.4799 seconds and 0.0012 seconds precompiling AUTOTUNE convolution(1x512x408x640, 256x512x3x3) convolution 1.6813 ms 100.0% triton_convolution_972 3.4246 ms 49.1% triton_convolution_973 4.9632 ms 33.9% triton_convolution_969 5.7886 ms 29.0% triton_convolution_974 6.1524 ms 27.3% triton_convolution_975 6.9936 ms 24.0% triton_convolution_970 7.3198 ms 23.0% triton_convolution_971 41.5959 ms 4.0% SingleProcess AUTOTUNE benchmarking takes 1.4830 seconds and 0.0005 seconds precompiling W1106 02:37:07.597000 130759211187712 torch/_inductor/select_algorithm.py:1469] [0/0_1] out of resource: shared memory, Required: 294912, Hardware limit: 232448. Reducing block sizes or `num_stages` may help. AUTOTUNE addmm(261120x256, 261120x512, 512x256) bias_addmm 0.2716 ms 100.0% addmm 0.4977 ms 54.6% triton_mm_985 0.6522 ms 41.6% triton_mm_987 0.6760 ms 40.2% triton_mm_986 0.7659 ms 35.5% triton_mm_992 0.9313 ms 29.2% triton_mm_978 0.9738 ms 27.9% triton_mm_977 1.0178 ms 26.7% triton_mm_982 1.0257 ms 26.5% triton_mm_994 1.0479 ms 25.9% SingleProcess AUTOTUNE benchmarking takes 2.6200 seconds and 0.0012 seconds precompiling [DEBUG] Extracted 1 images and 1 text segments [DEBUG] Multimodal output type: <class 'list'> [DEBUG] Multimodal output length: 2 Generated textual outputs: [1] urs
Version Details
- Version ID
f73a1797417c699d3332ec5653755d0ea68c38fc40bdda302e6e7b187f570d6c- Version Created
- November 6, 2025