qwen/qwen-image-lora-trainer 🔢📝✓❓🖼️ → 🖼️

⭐ Official ▶️ 146 runs 📅 Aug 2025 ⚙️ Cog 0.16.2 🔗 GitHub
composition-control image-lora-training text-to-image

About

Fine-tunable Qwen Image model with exceptional composition abilities - train custom LoRAs for any style or subject

Example Output

Prompt:

"a close-up half-portrait photo of a young AI researcher wearing a sleek teal and silver hoodie with "QWEN-IMAGE 20B MMDiT" emblazoned across the chest in glowing holographic Chinese and English text, has trendy round wireframe glasses reflecting code snippets, purple-streaked hair in a messy bun with tiny LED circuit board hair clips, she is standing in front of a neon-lit tech conference booth in Hangzhou with a massive banner reading "ALIBABA QWEN TEAM - SUPERIOR TEXT RENDERING" in perfect multi-line layout, very late at night during a hackathon, a small robotic owl is perched on her shoulder with miniature projection wings displaying sample generated images, in her hand she is holding up a tablet showing a perfectly rendered Chinese calligraphy poem generated in real-time, scattered around her feet are discarded energy drink cans and prototype VR headsets, in her other hand she's making a peace sign gesture, under her arm is a plushie shaped like a neural network node, she is wearing a backpack covered in open-source Apache 2.0 license patches and a tiny Hugging Face mascot keychain"

Output

Example output

Performance Metrics

30.89s Prediction Time
30.90s Total Time
All Input Parameters
{
  "prompt": "a close-up half-portrait photo of a young AI researcher wearing a sleek teal and silver hoodie with \"QWEN-IMAGE 20B MMDiT\" emblazoned across the chest in glowing holographic Chinese and English text, has trendy round wireframe glasses reflecting code snippets, purple-streaked hair in a messy bun with tiny LED circuit board hair clips, she is standing in front of a neon-lit tech conference booth in Hangzhou with a massive banner reading \"ALIBABA QWEN TEAM - SUPERIOR TEXT RENDERING\" in perfect multi-line layout, very late at night during a hackathon, a small robotic owl is perched on her shoulder with miniature projection wings displaying sample generated images, in her hand she is holding up a tablet showing a perfectly rendered Chinese calligraphy poem generated in real-time, scattered around her feet are discarded energy drink cans and prototype VR headsets, in her other hand she's making a peace sign gesture, under her arm is a plushie shaped like a neural network node, she is wearing a backpack covered in open-source Apache 2.0 license patches and a tiny Hugging Face mascot keychain",
  "go_fast": false,
  "guidance": 4,
  "image_size": "optimize_for_quality",
  "lora_scale": 1,
  "aspect_ratio": "16:9",
  "output_format": "webp",
  "enhance_prompt": false,
  "output_quality": 80,
  "negative_prompt": "",
  "num_inference_steps": 50
}
Input Parameters
seed Type: integer
Set a seed for reproducibility. Random by default.
width Type: integerRange: 512 - 2048
Custom width in pixels. Provide both width and height for custom dimensions (overrides aspect_ratio/image_size).
height Type: integerRange: 512 - 2048
Custom height in pixels. Provide both width and height for custom dimensions (overrides aspect_ratio/image_size).
prompt (required) Type: string
The main prompt for image generation
go_fast Type: booleanDefault: false
Use LCM-LoRA to accelerate image generation (trades quality for 8x speed)
guidance Type: numberDefault: 4Range: 0 - 10
Guidance scale for image generation. Defaults to 1 if go_fast, else 3.5.
image_size Default: optimize_for_quality
Image size preset (quality = larger, speed = faster). Ignored if width and height are both provided.
lora_scale Type: numberDefault: 1Range: 0 - 3
Scale for LoRA weights (0 = base model, 1 = full LoRA)
aspect_ratio Default: 16:9
Aspect ratio for the generated image. Ignored if width and height are both provided.
output_format Default: webp
Format of the output images
enhance_prompt Type: booleanDefault: false
Automatically enhance the prompt for better image generation
output_quality Type: integerDefault: 80Range: 0 - 100
Quality when saving images (0-100, higher is better, 100 = lossless)
negative_prompt Type: stringDefault:
Things you do not want to see in your image
replicate_weights Type: string
Path to LoRA weights file. Leave blank to use base model.
num_inference_steps Type: integerDefault: 50Range: 0 - 50
Number of denoising steps. More steps = higher quality. Defaults to 4 if go_fast, else 28.
Output Schema

Output

Type: stringFormat: uri

Example Execution Logs
📐 Using quality preset for 16:9: 1664x928
Using random seed: 2565513562
Generating: a close-up half-portrait photo of a young AI researcher wearing a sleek teal and silver hoodie with "QWEN-IMAGE 20B MMDiT" emblazoned across the chest in glowing holographic Chinese and English text, has trendy round wireframe glasses reflecting code snippets, purple-streaked hair in a messy bun with tiny LED circuit board hair clips, she is standing in front of a neon-lit tech conference booth in Hangzhou with a massive banner reading "ALIBABA QWEN TEAM - SUPERIOR TEXT RENDERING" in perfect multi-line layout, very late at night during a hackathon, a small robotic owl is perched on her shoulder with miniature projection wings displaying sample generated images, in her hand she is holding up a tablet showing a perfectly rendered Chinese calligraphy poem generated in real-time, scattered around her feet are discarded energy drink cans and prototype VR headsets, in her other hand she's making a peace sign gesture, under her arm is a plushie shaped like a neural network node, she is wearing a backpack covered in open-source Apache 2.0 license patches and a tiny Hugging Face mascot keychain (1664x928, steps=50, seed=2565513562)
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:29,  1.69it/s]
  4%|▍         | 2/50 [00:01<00:28,  1.67it/s]
  6%|▌         | 3/50 [00:01<00:28,  1.66it/s]
  8%|▊         | 4/50 [00:02<00:27,  1.66it/s]
 10%|█         | 5/50 [00:03<00:27,  1.66it/s]
 12%|█▏        | 6/50 [00:03<00:26,  1.65it/s]
 14%|█▍        | 7/50 [00:04<00:26,  1.65it/s]
 16%|█▌        | 8/50 [00:04<00:25,  1.65it/s]
 18%|█▊        | 9/50 [00:05<00:24,  1.65it/s]
 20%|██        | 10/50 [00:06<00:24,  1.65it/s]
 22%|██▏       | 11/50 [00:06<00:23,  1.65it/s]
 24%|██▍       | 12/50 [00:07<00:23,  1.65it/s]
 26%|██▌       | 13/50 [00:07<00:22,  1.65it/s]
 28%|██▊       | 14/50 [00:08<00:21,  1.65it/s]
 30%|███       | 15/50 [00:09<00:21,  1.65it/s]
 32%|███▏      | 16/50 [00:09<00:20,  1.65it/s]
 34%|███▍      | 17/50 [00:10<00:20,  1.64it/s]
 36%|███▌      | 18/50 [00:10<00:19,  1.65it/s]
 38%|███▊      | 19/50 [00:11<00:18,  1.65it/s]
 40%|████      | 20/50 [00:12<00:18,  1.65it/s]
 42%|████▏     | 21/50 [00:12<00:17,  1.65it/s]
 44%|████▍     | 22/50 [00:13<00:17,  1.65it/s]
 46%|████▌     | 23/50 [00:13<00:16,  1.65it/s]
 48%|████▊     | 24/50 [00:14<00:15,  1.65it/s]
 50%|█████     | 25/50 [00:15<00:15,  1.65it/s]
 52%|█████▏    | 26/50 [00:15<00:14,  1.65it/s]
 54%|█████▍    | 27/50 [00:16<00:13,  1.65it/s]
 56%|█████▌    | 28/50 [00:16<00:13,  1.65it/s]
 58%|█████▊    | 29/50 [00:17<00:12,  1.65it/s]
 60%|██████    | 30/50 [00:18<00:12,  1.65it/s]
 62%|██████▏   | 31/50 [00:18<00:11,  1.65it/s]
 64%|██████▍   | 32/50 [00:19<00:10,  1.65it/s]
 66%|██████▌   | 33/50 [00:20<00:10,  1.65it/s]
 68%|██████▊   | 34/50 [00:20<00:09,  1.65it/s]
 70%|███████   | 35/50 [00:21<00:09,  1.65it/s]
 72%|███████▏  | 36/50 [00:21<00:08,  1.65it/s]
 74%|███████▍  | 37/50 [00:22<00:07,  1.65it/s]
 76%|███████▌  | 38/50 [00:23<00:07,  1.65it/s]
 78%|███████▊  | 39/50 [00:23<00:06,  1.65it/s]
 80%|████████  | 40/50 [00:24<00:06,  1.65it/s]
 82%|████████▏ | 41/50 [00:24<00:05,  1.65it/s]
 84%|████████▍ | 42/50 [00:25<00:04,  1.65it/s]
 86%|████████▌ | 43/50 [00:26<00:04,  1.65it/s]
 88%|████████▊ | 44/50 [00:26<00:03,  1.65it/s]
 90%|█████████ | 45/50 [00:27<00:03,  1.64it/s]
 92%|█████████▏| 46/50 [00:27<00:02,  1.65it/s]
 94%|█████████▍| 47/50 [00:28<00:01,  1.64it/s]
 96%|█████████▌| 48/50 [00:29<00:01,  1.64it/s]
 98%|█████████▊| 49/50 [00:29<00:00,  1.64it/s]
100%|██████████| 50/50 [00:30<00:00,  1.64it/s]
100%|██████████| 50/50 [00:30<00:00,  1.65it/s]
Generation took 30.55 seconds
Total safe images: 1/1
Version Details
Version ID
04c2e2d513ab19063e5ba401e322feb9ad8ca9150b8c5e6417a656e719edbd55
Version Created
August 20, 2025
Run on Replicate →