qwen/qwen-image-2-pro 🔢🖼️📝❓✓ → 🖼️

⭐ Official ▶️ 57.0K runs 📅 Mar 2026 ⚙️ Cog 0.16.12 🔗 GitHub 📄 Paper ⚖️ License

image-editing image-to-image text-to-image

Performance

9.2sTypical run time

57.0KTotal runs

About

The pro version of Qwen Image 2 from Alibaba's Qwen team. Enhanced text rendering, realism, and semantic adherence for high-quality image generation and editing.

Example Output

Prompt:

A wide-angle smartphone photograph of a modern glass whiteboard mounted on a wall inside a bright, airy office room with floor-to-ceiling windows overlooking the Great Wall of China winding across misty mountain ridges at golden hour — warm sunlight casts soft reflections and long shadows across the scene.
Centered in the frame, a woman in her late 20s wearing a relaxed-fit white t-shirt prominently featuring a sleek “Qwen-Image” logo in gradient blue typography is writing on the board with a fine-tip magnetic stylus.
Her handwriting is natural, slightly imperfect, and expressive — with visible pressure variation, subtle smudges, and organic line weight — conveying authentic human authorship.
In the lower-left corner of the glass surface, the photographer’s faint but unmistakable reflection appears: blurred outline of a person holding a phone at arm’s length, capturing the moment.

On the left side of the whiteboard, clean, legible handwritten text appears in dark gray marker with exceptional stroke fidelity:
’Qwen-Image-2.0 Core Innovations:
• Complex Typography Engine: 1K-token instruction support for professional PPTs, posters & infographics — pixel-perfect multi-script layout, sophisticated text-image composition, and complete rendering of large-volume textual content
• Extreme Photorealism: Native 2K resolution (2048×2048) with microscopic detail on skin pores, fabric weave, architectural textures & natural foliage
• Unified Omni Model: Generation + editing in one model — full-stack multimodal understanding and generation capabilities seamlessly integrated
• 7B Efficiency: 2K image generation in seconds — optimal balance between visual fidelity and inference speed’

On the right side of the whiteboard, vertically aligned technical notes in crisp marker:
’Why It Matters:
→ One model delivers photorealistic imagery AND pixel-perfect text rendering simultaneously
→ One model powers both text-to-image generation AND precise image editing without pipeline switching
→ One model unifies deep multimodal understanding AND high-fidelity generation in a single 7B architecture’

In the bottom-right corner, a hand-drawn schematic in precise strokes:
’[8B Qwen3-VL Encoder] → [7B Diffusion Decoder] → pixels (2048×2048)’
— arrows flow with perspective depth, boxes feature soft shading, resolution specs annotated in fine print.

The glass surface exhibits realistic optical properties.
Background includes minimalist wooden shelving with design magazines open to full-bleed infographics — one prominently displays a crisp cover reading “Qwen 3.5” in bold modern typography — and a potted fiddle-leaf fig with individually rendered leaf veins partially visible out-of-focus.

Output

Performance Metrics

9.22s Prediction Time

9.23s Total Time

All Input Parameters

{
  "prompt": "A wide-angle smartphone photograph of a modern glass whiteboard mounted on a wall inside a bright, airy office room with floor-to-ceiling windows overlooking the Great Wall of China winding across misty mountain ridges at golden hour — warm sunlight casts soft reflections and long shadows across the scene.\\nCentered in the frame, a woman in her late 20s wearing a relaxed-fit white t-shirt prominently featuring a sleek “Qwen-Image” logo in gradient blue typography is writing on the board with a fine-tip magnetic stylus.\\nHer handwriting is natural, slightly imperfect, and expressive — with visible pressure variation, subtle smudges, and organic line weight — conveying authentic human authorship.\\nIn the lower-left corner of the glass surface, the photographer’s faint but unmistakable reflection appears: blurred outline of a person holding a phone at arm’s length, capturing the moment.\\n\\nOn the left side of the whiteboard, clean, legible handwritten text appears in dark gray marker with exceptional stroke fidelity:\\n’Qwen-Image-2.0 Core Innovations:\\n• Complex Typography Engine: 1K-token instruction support for professional PPTs, posters & infographics — pixel-perfect multi-script layout, sophisticated text-image composition, and complete rendering of large-volume textual content\\n• Extreme Photorealism: Native 2K resolution (2048×2048) with microscopic detail on skin pores, fabric weave, architectural textures & natural foliage\\n• Unified Omni Model: Generation + editing in one model — full-stack multimodal understanding and generation capabilities seamlessly integrated\\n• 7B Efficiency: 2K image generation in seconds — optimal balance between visual fidelity and inference speed’\\n\\nOn the right side of the whiteboard, vertically aligned technical notes in crisp marker:\\n’Why It Matters:\\n→ One model delivers photorealistic imagery AND pixel-perfect text rendering simultaneously\\n→ One model powers both text-to-image generation AND precise image editing without pipeline switching\\n→ One model unifies deep multimodal understanding AND high-fidelity generation in a single 7B architecture’\\n\\nIn the bottom-right corner, a hand-drawn schematic in precise strokes:\\n’[8B Qwen3-VL Encoder] → [7B Diffusion Decoder] → pixels (2048×2048)’\\n— arrows flow with perspective depth, boxes feature soft shading, resolution specs annotated in fine print.\\n\\nThe glass surface exhibits realistic optical properties.\\nBackground includes minimalist wooden shelving with design magazines open to full-bleed infographics — one prominently displays a crisp cover reading “Qwen 3.5” in bold modern typography — and a potted fiddle-leaf fig with individually rendered leaf veins partially visible out-of-focus.",
  "aspect_ratio": "1:1",
  "negative_prompt": "",
  "match_input_image": false,
  "enable_prompt_expansion": false
}

Input Parameters

seed Type: integer: Random seed for reproducible generation. Range: 0-2147483647
image Type: string: Optional reference image for image editing, style transfer, or image-to-image generation
prompt (required) Type: string: Text prompt for image generation or editing
aspect_ratio Default: 1:1: Aspect ratio of the generated image
negative_prompt Type: stringDefault:: Negative prompt to specify elements to avoid in the generated image
match_input_image Type: booleanDefault: false: When true and an image is provided, use the input image's aspect ratio and resolution instead of the aspect_ratio parameter
enable_prompt_expansion Type: booleanDefault: true: Automatically expand and optimize the prompt for better results

Output Schema

Output

Type: string • Format: uri

Example Execution Logs

Using seed: 437034509
Prompt length 2726 exceeds 2000 characters, truncating
Generating image with qwen-image-2.0-pro (text-to-image)...
Sending request to qwen-image-2.0-pro with size=1024*1024
Response status: stop
Generated 1 image(s) in 7.1sec
Downloading 1437891 bytes
Downloaded 1.37MB in 1.96sec

Version Details

Version ID: ece119c87ba75f46154c80282a3b05db3d88e708d09bbe3d772b9b95fb6f3be6
Version Created: March 4, 2026

Run on Replicate →