bytedance/bagel 🔢❓🖼️📝✓ → ❓

▶️ 278.4K runs 📅 May 2025 ⚙️ Cog 0.14.9 🔗 GitHub 📄 Paper ⚖️ License

image-editing image-to-text text-to-image

Performance

73.9sTypical run time

278.4KTotal runs

About

🥯ByteDance Seed's Bagel Unified multimodal AI that generates images, edits images, and understands images in one 7B parameter model🥯

Example Output

Prompt:

"She boards a modern London Tube, quietly reading a folded newspaper, wearing the same clothes"

Output

{"text":null,"image":"https://replicate.delivery/xezq/KWWpk9oeorSXJKisnvRRdSgxif9d06XNC1zAi2sreWeJ6NelC/output.webp"}

Performance Metrics

73.93s Prediction Time

83.02s Total Time

All Input Parameters

{
  "task": "image-editing",
  "image": "https://replicate.delivery/pbxt/N3j6lENyFDxARhorZ8yY86qhIF1uMuvEMO1KytosXUxaz0EO/image.png",
  "prompt": "She boards a modern London Tube, quietly reading a folded newspaper, wearing the same clothes",
  "cfg_img_scale": 2,
  "output_format": "webp",
  "cfg_renorm_min": 1,
  "cfg_text_scale": 4,
  "output_quality": 90,
  "timestep_shift": 3,
  "cfg_renorm_type": "text_channel",
  "enable_thinking": false,
  "num_inference_steps": 50
}

Input Parameters

seed Type: integer: Random seed for reproducible results
task Default: text-to-image: Task to perform
image Type: string: Input image for editing or understanding tasks
prompt (required) Type: string: Text prompt for generation, editing, or understanding
cfg_img_scale Type: numberDefault: 1.5Range: 1 - 10: Image guidance scale for preserving input image details
output_format Default: webp: Output image format
cfg_renorm_min Type: numberDefault: 1Range: 0 - 1: Minimum CFG renorm value
cfg_text_scale Type: numberDefault: 4Range: 1 - 20: Text guidance scale for how closely to follow the prompt
output_quality Type: integerDefault: 90Range: 1 - 100: Image compression quality for lossy formats
timestep_shift Type: numberDefault: 3Range: 1 - 10: Distribution of denoising steps between composition and details
cfg_renorm_type Default: global: CFG renormalization method
enable_thinking Type: booleanDefault: false: Enable chain-of-thought reasoning for better results
num_inference_steps Type: integerDefault: 50Range: 1 - 100: Number of denoising steps

Output Schema

Example Execution Logs

[+] Using seed: 60309
[+] Loaded input image: (1534, 1968)
[+] Running image editing
[+] Processing prompt: She boards a modern London Tube, quietly reading a folded newspaper, wearing the same clothes
[+] Generated 800x1024 image saved as WEBP

Version Details

Version ID: 7dd8def79e503990740db4704fa81af995d440fefe714958531d7044d2757c9c
Version Created: May 23, 2025

Run on Replicate →