PosterOmni

Generalized Artistic Poster Creation
via Task Distillation and Unified Reward Feedback

1HKUST(GZ) 2Meituan 3HKUST
Equal Contribution Corresponding Author

Abstract

Image-to-poster generation is a multi-dimensional process coupling entity-preserving local editing (such as rescaling, filling, and extending) with concept-driven global creation (like layout and style transfer).

We propose PosterOmni, a generalized framework that unifies these regimes via an efficient data–distillation–reward pipeline. Our approach involves constructing multi-scenario datasets covering six task types, distilling knowledge from specialized experts, and applying Unified Reward Feedback to align outcomes with aesthetic preferences. Extensive experiments show that PosterOmni significantly outperforms existing baselines in both fidelity and design quality.

Diverse Poster Creation Tasks

1
Local Editing Precision

Performs precise local adjustments including extending, filling, rescaling, and identity-driven generation while preserving the original subject.

2
Global Creation Reasoning

Handles abstract high-level tasks such as layout-driven and style-driven generation, ensuring aesthetic coherence across the entire poster.

3
Unified Framework

Seamlessly integrates multiple editing and generation capabilities into a single model without switching pipelines.

PosterOmni Capabilities Teaser

Automated Data Construction

1
Prompt & Image Generation

Leverages GPT-4 and Qwen to generate diverse, structured prompts and initial images covering various themes.

2
Multimodal Filtering

Employs OCR and VLM-based filtering to ensure textual correctness and layout-content consistency.

3
Task-Specific Construction

Automatically synthesizes paired data for 6 specific tasks using tools like SAM-2 and BrushNet.

Data Construction Pipeline

Progressive Training Pipeline

1
Task-Specific SFT

Trains specialized experts for local editing and global creation to ensure high fidelity in distinct domains.

2
Task Distillation

Distills knowledge from experts into a unified student model, merging pixel precision with aesthetic understanding.

3
Unified Reward Feedback

Aligns with human preferences using a reward model that evaluates both aesthetic appeal and instruction adherence.

4
Omni-Edit RL

Uses Reinforcement Learning to refine generation quality and align it with professional design standards.

PosterOmni Methodology Overview

Quantitative Comparison PosterOmni-Bench

Model Extending Filling Rescaling Id-consis. Layout-dri. Style-dri. Overall
ICEdit 1.99 / - 3.21 / - 1.73 / - 1.59 / - 1.53 / - 1.67 / - 1.95 / -
Step1X-Edit 3.04 / 3.67 4.35 / 4.21 1.60 / 1.75 1.70 / 2.14 1.63 / 1.82 1.57 / 1.79 2.31 / 2.56
BAGEL 2.33 / 2.84 2.77 / 2.67 1.77 / 1.40 1.92 / 2.29 2.34 / 3.03 1.85 / 2.34 2.15 / 2.43
OmniGen2 2.56 / - 2.32 / - 1.61 / - 3.25 / - 2.22 / - 1.84 / - 2.59 / -
FLUX.1 Kontext 3.12 / - 3.61 / - 3.16 / - 3.39 / - 3.03 / - 2.88 / - 3.20 / -
Qwen-Image-Edit 4.28 / 4.24 3.95 / 3.79 3.40 / 3.54 3.06 / 3.37 3.44 / 2.97 2.91 / 2.83 3.51 / 3.46
UniWorld-V2 4.25 / 4.22 3.57 / 3.18 3.07 / 3.23 2.87 / 3.20 3.66 / 3.79 3.14 / 2.85 3.42 / 3.41
Seedream-3.0 3.52 / 3.76 3.40 / 3.52 2.38 / 2.84 2.88 / 3.30 2.68 / 3.04 2.32 / 2.82 2.86 / 3.21
Seedream-4.0 4.41 / 4.57 4.44 / 4.64 4.00 / 3.69 4.53 / 4.62 4.05 / 4.22 4.23 / 4.31 4.28 / 4.34
PosterOmni (Ours) 4.76 / 4.72 4.69 / 4.77 3.97 / 3.81 3.98 / 4.23 4.20 / 4.35 3.99 / 4.36 4.27 / 4.37
vs. Baseline (Qwen) +0.48 / +0.48 +0.74 / +0.98 +0.57 / +0.27 +0.92 / +0.86 +0.76 / +1.38 +1.08 / +1.53 +0.76 / +0.91

Table 1: Quantitative comparison results on PosterOmni-Bench. Gold indicates the best performance, and Blue indicates the second best.

Gallery