qwen/qwen-image-2-pro 🔢🖼️📝❓✓ → 🖼️
About
The pro version of Qwen Image 2 from Alibaba's Qwen team. Enhanced text rendering, realism, and semantic adherence for high-quality image generation and editing.
Example Output
"
A wide-angle smartphone photograph of a modern glass whiteboard mounted on a wall inside a bright, airy office room with floor-to-ceiling windows overlooking the Great Wall of China winding across misty mountain ridges at golden hour — warm sunlight casts soft reflections and long shadows across the scene.
Centered in the frame, a woman in her late 20s wearing a relaxed-fit white t-shirt prominently featuring a sleek “Qwen-Image” logo in gradient blue typography is writing on the board with a fine-tip magnetic stylus.
Her handwriting is natural, slightly imperfect, and expressive — with visible pressure variation, subtle smudges, and organic line weight — conveying authentic human authorship.
In the lower-left corner of the glass surface, the photographer’s faint but unmistakable reflection appears: blurred outline of a person holding a phone at arm’s length, capturing the moment.
On the left side of the whiteboard, clean, legible handwritten text appears in dark gray marker with exceptional stroke fidelity:
’Qwen-Image-2.0 Core Innovations:
• Complex Typography Engine: 1K-token instruction support for professional PPTs, posters & infographics — pixel-perfect multi-script layout, sophisticated text-image composition, and complete rendering of large-volume textual content
• Extreme Photorealism: Native 2K resolution (2048×2048) with microscopic detail on skin pores, fabric weave, architectural textures & natural foliage
• Unified Omni Model: Generation + editing in one model — full-stack multimodal understanding and generation capabilities seamlessly integrated
• 7B Efficiency: 2K image generation in seconds — optimal balance between visual fidelity and inference speed’
On the right side of the whiteboard, vertically aligned technical notes in crisp marker:
’Why It Matters:
→ One model delivers photorealistic imagery AND pixel-perfect text rendering simultaneously
→ One model powers both text-to-image generation AND precise image editing without pipeline switching
→ One model unifies deep multimodal understanding AND high-fidelity generation in a single 7B architecture’
In the bottom-right corner, a hand-drawn schematic in precise strokes:
’[8B Qwen3-VL Encoder] → [7B Diffusion Decoder] → pixels (2048×2048)’
— arrows flow with perspective depth, boxes feature soft shading, resolution specs annotated in fine print.
The glass surface exhibits realistic optical properties.
Background includes minimalist wooden shelving with design magazines open to full-bleed infographics — one prominently displays a crisp cover reading “Qwen 3.5” in bold modern typography — and a potted fiddle-leaf fig with individually rendered leaf veins partially visible out-of-focus.
Output
Performance Metrics
All Input Parameters
{
"prompt": "A wide-angle smartphone photograph of a modern glass whiteboard mounted on a wall inside a bright, airy office room with floor-to-ceiling windows overlooking the Great Wall of China winding across misty mountain ridges at golden hour — warm sunlight casts soft reflections and long shadows across the scene.\\nCentered in the frame, a woman in her late 20s wearing a relaxed-fit white t-shirt prominently featuring a sleek “Qwen-Image” logo in gradient blue typography is writing on the board with a fine-tip magnetic stylus.\\nHer handwriting is natural, slightly imperfect, and expressive — with visible pressure variation, subtle smudges, and organic line weight — conveying authentic human authorship.\\nIn the lower-left corner of the glass surface, the photographer’s faint but unmistakable reflection appears: blurred outline of a person holding a phone at arm’s length, capturing the moment.\\n\\nOn the left side of the whiteboard, clean, legible handwritten text appears in dark gray marker with exceptional stroke fidelity:\\n’Qwen-Image-2.0 Core Innovations:\\n• Complex Typography Engine: 1K-token instruction support for professional PPTs, posters & infographics — pixel-perfect multi-script layout, sophisticated text-image composition, and complete rendering of large-volume textual content\\n• Extreme Photorealism: Native 2K resolution (2048×2048) with microscopic detail on skin pores, fabric weave, architectural textures & natural foliage\\n• Unified Omni Model: Generation + editing in one model — full-stack multimodal understanding and generation capabilities seamlessly integrated\\n• 7B Efficiency: 2K image generation in seconds — optimal balance between visual fidelity and inference speed’\\n\\nOn the right side of the whiteboard, vertically aligned technical notes in crisp marker:\\n’Why It Matters:\\n→ One model delivers photorealistic imagery AND pixel-perfect text rendering simultaneously\\n→ One model powers both text-to-image generation AND precise image editing without pipeline switching\\n→ One model unifies deep multimodal understanding AND high-fidelity generation in a single 7B architecture’\\n\\nIn the bottom-right corner, a hand-drawn schematic in precise strokes:\\n’[8B Qwen3-VL Encoder] → [7B Diffusion Decoder] → pixels (2048×2048)’\\n— arrows flow with perspective depth, boxes feature soft shading, resolution specs annotated in fine print.\\n\\nThe glass surface exhibits realistic optical properties.\\nBackground includes minimalist wooden shelving with design magazines open to full-bleed infographics — one prominently displays a crisp cover reading “Qwen 3.5” in bold modern typography — and a potted fiddle-leaf fig with individually rendered leaf veins partially visible out-of-focus.",
"aspect_ratio": "1:1",
"negative_prompt": "",
"match_input_image": false,
"enable_prompt_expansion": false
}
Input Parameters
- seed
- Random seed for reproducible generation. Range: 0-2147483647
- image
- Optional reference image for image editing, style transfer, or image-to-image generation
- prompt (required)
- Text prompt for image generation or editing
- aspect_ratio
- Aspect ratio of the generated image
- negative_prompt
- Negative prompt to specify elements to avoid in the generated image
- match_input_image
- When true and an image is provided, use the input image's aspect ratio and resolution instead of the aspect_ratio parameter
- enable_prompt_expansion
- Automatically expand and optimize the prompt for better results
Output Schema
Output
Example Execution Logs
Using seed: 437034509 Prompt length 2726 exceeds 2000 characters, truncating Generating image with qwen-image-2.0-pro (text-to-image)... Sending request to qwen-image-2.0-pro with size=1024*1024 Response status: stop Generated 1 image(s) in 7.1sec Downloading 1437891 bytes Downloaded 1.37MB in 1.96sec
Version Details
- Version ID
ece119c87ba75f46154c80282a3b05db3d88e708d09bbe3d772b9b95fb6f3be6- Version Created
- March 4, 2026