Image-to-poster generation is a multi-dimensional process coupling entity-preserving local editing (such as rescaling, filling, and extending) with concept-driven global creation (like layout and style transfer).
We propose PosterOmni, a generalized framework that unifies these regimes via an efficient data–distillation–reward pipeline. Our approach involves constructing multi-scenario datasets covering six task types, distilling knowledge from specialized experts, and applying Unified Reward Feedback to align outcomes with aesthetic preferences. Extensive experiments show that PosterOmni significantly outperforms existing baselines in both fidelity and design quality.
Performs precise local adjustments including extending, filling, rescaling, and identity-driven generation while preserving the original subject.
Handles abstract high-level tasks such as layout-driven and style-driven generation, ensuring aesthetic coherence across the entire poster.
Seamlessly integrates multiple editing and generation capabilities into a single model without switching pipelines.
Leverages GPT-4 and Qwen to generate diverse, structured prompts and initial images covering various themes.
Employs OCR and VLM-based filtering to ensure textual correctness and layout-content consistency.
Automatically synthesizes paired data for 6 specific tasks using tools like SAM-2 and BrushNet.
Trains specialized experts for local editing and global creation to ensure high fidelity in distinct domains.
Distills knowledge from experts into a unified student model, merging pixel precision with aesthetic understanding.
Aligns with human preferences using a reward model that evaluates both aesthetic appeal and instruction adherence.
Uses Reinforcement Learning to refine generation quality and align it with professional design standards.
| Model | Extending | Filling | Rescaling | Id-consis. | Layout-dri. | Style-dri. | Overall |
|---|---|---|---|---|---|---|---|
| ICEdit | 1.99 / - | 3.21 / - | 1.73 / - | 1.59 / - | 1.53 / - | 1.67 / - | 1.95 / - |
| Step1X-Edit | 3.04 / 3.67 | 4.35 / 4.21 | 1.60 / 1.75 | 1.70 / 2.14 | 1.63 / 1.82 | 1.57 / 1.79 | 2.31 / 2.56 |
| BAGEL | 2.33 / 2.84 | 2.77 / 2.67 | 1.77 / 1.40 | 1.92 / 2.29 | 2.34 / 3.03 | 1.85 / 2.34 | 2.15 / 2.43 |
| OmniGen2 | 2.56 / - | 2.32 / - | 1.61 / - | 3.25 / - | 2.22 / - | 1.84 / - | 2.59 / - |
| FLUX.1 Kontext | 3.12 / - | 3.61 / - | 3.16 / - | 3.39 / - | 3.03 / - | 2.88 / - | 3.20 / - |
| Qwen-Image-Edit | 4.28 / 4.24 | 3.95 / 3.79 | 3.40 / 3.54 | 3.06 / 3.37 | 3.44 / 2.97 | 2.91 / 2.83 | 3.51 / 3.46 |
| UniWorld-V2 | 4.25 / 4.22 | 3.57 / 3.18 | 3.07 / 3.23 | 2.87 / 3.20 | 3.66 / 3.79 | 3.14 / 2.85 | 3.42 / 3.41 |
| Seedream-3.0 | 3.52 / 3.76 | 3.40 / 3.52 | 2.38 / 2.84 | 2.88 / 3.30 | 2.68 / 3.04 | 2.32 / 2.82 | 2.86 / 3.21 |
| Seedream-4.0 | 4.41 / 4.57 | 4.44 / 4.64 | 4.00 / 3.69 | 4.53 / 4.62 | 4.05 / 4.22 | 4.23 / 4.31 | 4.28 / 4.34 |
| PosterOmni (Ours) | 4.76 / 4.72 | 4.69 / 4.77 | 3.97 / 3.81 | 3.98 / 4.23 | 4.20 / 4.35 | 3.99 / 4.36 | 4.27 / 4.37 |
| vs. Baseline (Qwen) | +0.48 / +0.48 | +0.74 / +0.98 | +0.57 / +0.27 | +0.92 / +0.86 | +0.76 / +1.38 | +1.08 / +1.53 | +0.76 / +0.91 |
Table 1: Quantitative comparison results on PosterOmni-Bench. Gold indicates the best performance, and Blue indicates the second best.