qwen/qwen-image-lora-trainer 🔢📝✓❓🖼️ → 🖼️
About
Fine-tunable Qwen Image model with exceptional composition abilities - train custom LoRAs for any style or subject

Example Output
"a close-up half-portrait photo of a young AI researcher wearing a sleek teal and silver hoodie with "QWEN-IMAGE 20B MMDiT" emblazoned across the chest in glowing holographic Chinese and English text, has trendy round wireframe glasses reflecting code snippets, purple-streaked hair in a messy bun with tiny LED circuit board hair clips, she is standing in front of a neon-lit tech conference booth in Hangzhou with a massive banner reading "ALIBABA QWEN TEAM - SUPERIOR TEXT RENDERING" in perfect multi-line layout, very late at night during a hackathon, a small robotic owl is perched on her shoulder with miniature projection wings displaying sample generated images, in her hand she is holding up a tablet showing a perfectly rendered Chinese calligraphy poem generated in real-time, scattered around her feet are discarded energy drink cans and prototype VR headsets, in her other hand she's making a peace sign gesture, under her arm is a plushie shaped like a neural network node, she is wearing a backpack covered in open-source Apache 2.0 license patches and a tiny Hugging Face mascot keychain"
Output

Performance Metrics
All Input Parameters
{ "prompt": "a close-up half-portrait photo of a young AI researcher wearing a sleek teal and silver hoodie with \"QWEN-IMAGE 20B MMDiT\" emblazoned across the chest in glowing holographic Chinese and English text, has trendy round wireframe glasses reflecting code snippets, purple-streaked hair in a messy bun with tiny LED circuit board hair clips, she is standing in front of a neon-lit tech conference booth in Hangzhou with a massive banner reading \"ALIBABA QWEN TEAM - SUPERIOR TEXT RENDERING\" in perfect multi-line layout, very late at night during a hackathon, a small robotic owl is perched on her shoulder with miniature projection wings displaying sample generated images, in her hand she is holding up a tablet showing a perfectly rendered Chinese calligraphy poem generated in real-time, scattered around her feet are discarded energy drink cans and prototype VR headsets, in her other hand she's making a peace sign gesture, under her arm is a plushie shaped like a neural network node, she is wearing a backpack covered in open-source Apache 2.0 license patches and a tiny Hugging Face mascot keychain", "go_fast": false, "guidance": 4, "image_size": "optimize_for_quality", "lora_scale": 1, "aspect_ratio": "16:9", "output_format": "webp", "enhance_prompt": false, "output_quality": 80, "negative_prompt": "", "num_inference_steps": 50 }
Input Parameters
- seed
- Set a seed for reproducibility. Random by default.
- width
- Custom width in pixels. Provide both width and height for custom dimensions (overrides aspect_ratio/image_size).
- height
- Custom height in pixels. Provide both width and height for custom dimensions (overrides aspect_ratio/image_size).
- prompt (required)
- The main prompt for image generation
- go_fast
- Use LCM-LoRA to accelerate image generation (trades quality for 8x speed)
- guidance
- Guidance scale for image generation. Defaults to 1 if go_fast, else 3.5.
- image_size
- Image size preset (quality = larger, speed = faster). Ignored if width and height are both provided.
- lora_scale
- Scale for LoRA weights (0 = base model, 1 = full LoRA)
- aspect_ratio
- Aspect ratio for the generated image. Ignored if width and height are both provided.
- output_format
- Format of the output images
- enhance_prompt
- Automatically enhance the prompt for better image generation
- output_quality
- Quality when saving images (0-100, higher is better, 100 = lossless)
- negative_prompt
- Things you do not want to see in your image
- replicate_weights
- Path to LoRA weights file. Leave blank to use base model.
- num_inference_steps
- Number of denoising steps. More steps = higher quality. Defaults to 4 if go_fast, else 28.
Output Schema
Output
Example Execution Logs
📐 Using quality preset for 16:9: 1664x928 Using random seed: 2565513562 Generating: a close-up half-portrait photo of a young AI researcher wearing a sleek teal and silver hoodie with "QWEN-IMAGE 20B MMDiT" emblazoned across the chest in glowing holographic Chinese and English text, has trendy round wireframe glasses reflecting code snippets, purple-streaked hair in a messy bun with tiny LED circuit board hair clips, she is standing in front of a neon-lit tech conference booth in Hangzhou with a massive banner reading "ALIBABA QWEN TEAM - SUPERIOR TEXT RENDERING" in perfect multi-line layout, very late at night during a hackathon, a small robotic owl is perched on her shoulder with miniature projection wings displaying sample generated images, in her hand she is holding up a tablet showing a perfectly rendered Chinese calligraphy poem generated in real-time, scattered around her feet are discarded energy drink cans and prototype VR headsets, in her other hand she's making a peace sign gesture, under her arm is a plushie shaped like a neural network node, she is wearing a backpack covered in open-source Apache 2.0 license patches and a tiny Hugging Face mascot keychain (1664x928, steps=50, seed=2565513562) 0%| | 0/50 [00:00<?, ?it/s] 2%|▏ | 1/50 [00:00<00:29, 1.69it/s] 4%|▍ | 2/50 [00:01<00:28, 1.67it/s] 6%|▌ | 3/50 [00:01<00:28, 1.66it/s] 8%|▊ | 4/50 [00:02<00:27, 1.66it/s] 10%|█ | 5/50 [00:03<00:27, 1.66it/s] 12%|█▏ | 6/50 [00:03<00:26, 1.65it/s] 14%|█▍ | 7/50 [00:04<00:26, 1.65it/s] 16%|█▌ | 8/50 [00:04<00:25, 1.65it/s] 18%|█▊ | 9/50 [00:05<00:24, 1.65it/s] 20%|██ | 10/50 [00:06<00:24, 1.65it/s] 22%|██▏ | 11/50 [00:06<00:23, 1.65it/s] 24%|██▍ | 12/50 [00:07<00:23, 1.65it/s] 26%|██▌ | 13/50 [00:07<00:22, 1.65it/s] 28%|██▊ | 14/50 [00:08<00:21, 1.65it/s] 30%|███ | 15/50 [00:09<00:21, 1.65it/s] 32%|███▏ | 16/50 [00:09<00:20, 1.65it/s] 34%|███▍ | 17/50 [00:10<00:20, 1.64it/s] 36%|███▌ | 18/50 [00:10<00:19, 1.65it/s] 38%|███▊ | 19/50 [00:11<00:18, 1.65it/s] 40%|████ | 20/50 [00:12<00:18, 1.65it/s] 42%|████▏ | 21/50 [00:12<00:17, 1.65it/s] 44%|████▍ | 22/50 [00:13<00:17, 1.65it/s] 46%|████▌ | 23/50 [00:13<00:16, 1.65it/s] 48%|████▊ | 24/50 [00:14<00:15, 1.65it/s] 50%|█████ | 25/50 [00:15<00:15, 1.65it/s] 52%|█████▏ | 26/50 [00:15<00:14, 1.65it/s] 54%|█████▍ | 27/50 [00:16<00:13, 1.65it/s] 56%|█████▌ | 28/50 [00:16<00:13, 1.65it/s] 58%|█████▊ | 29/50 [00:17<00:12, 1.65it/s] 60%|██████ | 30/50 [00:18<00:12, 1.65it/s] 62%|██████▏ | 31/50 [00:18<00:11, 1.65it/s] 64%|██████▍ | 32/50 [00:19<00:10, 1.65it/s] 66%|██████▌ | 33/50 [00:20<00:10, 1.65it/s] 68%|██████▊ | 34/50 [00:20<00:09, 1.65it/s] 70%|███████ | 35/50 [00:21<00:09, 1.65it/s] 72%|███████▏ | 36/50 [00:21<00:08, 1.65it/s] 74%|███████▍ | 37/50 [00:22<00:07, 1.65it/s] 76%|███████▌ | 38/50 [00:23<00:07, 1.65it/s] 78%|███████▊ | 39/50 [00:23<00:06, 1.65it/s] 80%|████████ | 40/50 [00:24<00:06, 1.65it/s] 82%|████████▏ | 41/50 [00:24<00:05, 1.65it/s] 84%|████████▍ | 42/50 [00:25<00:04, 1.65it/s] 86%|████████▌ | 43/50 [00:26<00:04, 1.65it/s] 88%|████████▊ | 44/50 [00:26<00:03, 1.65it/s] 90%|█████████ | 45/50 [00:27<00:03, 1.64it/s] 92%|█████████▏| 46/50 [00:27<00:02, 1.65it/s] 94%|█████████▍| 47/50 [00:28<00:01, 1.64it/s] 96%|█████████▌| 48/50 [00:29<00:01, 1.64it/s] 98%|█████████▊| 49/50 [00:29<00:00, 1.64it/s] 100%|██████████| 50/50 [00:30<00:00, 1.64it/s] 100%|██████████| 50/50 [00:30<00:00, 1.65it/s] Generation took 30.55 seconds Total safe images: 1/1
Version Details
- Version ID
04c2e2d513ab19063e5ba401e322feb9ad8ca9150b8c5e6417a656e719edbd55
- Version Created
- August 20, 2025