PosterOmni

Abstract

Image-to-poster generation is a multi-dimensional process coupling entity-preserving local editing (such as rescaling, filling, and extending) with concept-driven global creation (like layout and style transfer).

We propose PosterOmni, a generalized framework that unifies these regimes via an efficient data–distillation–reward pipeline. Our approach involves constructing multi-scenario datasets covering six task types, distilling knowledge from specialized experts, and applying Unified Reward Feedback to align outcomes with aesthetic preferences. Extensive experiments show that PosterOmni significantly outperforms existing baselines in both fidelity and design quality.

Task Demonstrations

Interactive examples showcasing PosterOmni's capabilities across diverse poster generation scenarios.

Shared Prompt:

A vibrant pop art poster for an IP limited edition product line, featuring a bold red sneaker with a graphic print on the left, a sleek blue smartphone case with a metallic logo in the center, and a gold collectible pin with a character design on the right—all set against a dynamic background of comic book dots and stripes. At the top is the title "IP Limited Edition Collection", and in the center is the text "Exclusive Drops Available Now".

↔️ Extending

Show Input

Shared Prompt:

Three orange emergency kits, featuring reflective strips and black shoulder straps, neatly arranged. Made from durable, sturdy materials and brightly colored to enhance visibility.

🖌️ Filling

Show Input

Instruction:

rescale this poster from 4:3 to 9:16

📐 Rescaling

Show Input

Shared Prompt:

Refer to the layout of this poster and create a new poster featuring a large beige speaker cabinet filled with plush toys. Next to it, place a metal high stool and a paintbrush with a wooden handle. On the right side, include a cluster of blooming light purple 3D-printed models. Add the text "Smart Space: Innovative Living" at the top and "Collaborative Exploration, Enhanced Experience" at the bottom.

📋 Layout-driven

Show Input

Shared Prompt:

Referencing the style of this poster, create an illustration featuring a colorful library at the center of the image, with a cartoon robot in the foreground, a beam-of-light tunnel at the upper right, and a giant Rubik's cube at the bottom. Include the text "Future Tech Playground" at the top and "Explore Infinite Possibilities" at the center of the image.

🎭 Style-driven

Show Input

Shared Prompt:

3D anime-inspired poster with intricate details and vivid lighting, exuding a sense of aesthetic beauty. On the left side of the image, a lush potted bamboo plant features vibrant, dew-kissed leaves gently swaying in the breeze, slender green stalks reaching upward, and a pot that embodies antiquity and elegance. In the lower right corner, a uniquely shaped cactus stands resilient in a simple terracotta pot, its surface covered with fine fuzz and sharp spines. The background showcases a bright, transparent indoor greenhouse space, where soft natural light pours in through the top and sides, creating a warm and tranquil atmosphere. The air is filled with the fresh scent of greenery, accompanied by a misty green glow and delicate particles of dust dancing in the sunlight. The entire scene conveys a sense of ethereal, semi-transparent structure, as if you are immersed in a vibrant, dreamlike oasis. Main title: "Bursting with Vitality" Style: Color: Vibrant green; energetic style; extended lettering with smooth strokes to create a refreshing, premium feel. Subtitle: "Breathe Purely, Feel Alive" Style: Color: Soft ivory; slender, elegant font strokes to deliver serenity and a sense of healing.

👤 Identity-driven

Show Input

Capabilities

Diverse Poster Creation Tasks

Local Editing Precision

Performs precise local adjustments including extending, filling, rescaling, and identity-driven generation while preserving the original subject.

Global Creation Reasoning

Handles abstract high-level tasks such as layout-driven and style-driven generation, ensuring aesthetic coherence across the entire poster.

Unified Framework

Seamlessly integrates multiple editing and generation capabilities into a single model without switching pipelines.

Data Engineering

Automated Data Construction

Prompt & Image Generation

Leverages GPT-4 and Qwen to generate diverse, structured prompts and initial images covering various themes.

Multimodal Filtering

Employs OCR and VLM-based filtering to ensure textual correctness and layout-content consistency.

Task-Specific Construction

Automatically synthesizes paired data for 6 specific tasks using tools like SAM-2 and BrushNet.

Methodology

Progressive Training Pipeline

Task-Specific SFT

Trains specialized experts for local editing and global creation to ensure high fidelity in distinct domains.

Task Distillation

Distills knowledge from experts into a unified student model, merging pixel precision with aesthetic understanding.

Unified Reward Feedback

Aligns with human preferences using a reward model that evaluates both aesthetic appeal and instruction adherence.

Omni-Edit RL

Uses Reinforcement Learning to refine generation quality and align it with professional design standards.

Quantitative Comparison PosterOmni-Bench

Model	Extending	Filling	Rescaling	Id-consis.	Layout-dri.	Style-dri.	Overall
ICEdit	1.99 / -	3.21 / -	1.73 / -	1.59 / -	1.53 / -	1.67 / -	1.95 / -
Step1X-Edit	3.04 / 3.67	4.35 / 4.21	1.60 / 1.75	1.70 / 2.14	1.63 / 1.82	1.57 / 1.79	2.31 / 2.56
BAGEL	2.33 / 2.84	2.77 / 2.67	1.77 / 1.40	1.92 / 2.29	2.34 / 3.03	1.85 / 2.34	2.15 / 2.43
OmniGen2	2.56 / -	2.32 / -	1.61 / -	3.25 / -	2.22 / -	1.84 / -	2.59 / -
FLUX.1 Kontext	3.12 / -	3.61 / -	3.16 / -	3.39 / -	3.03 / -	2.88 / -	3.20 / -
Qwen-Image-Edit	4.28 / 4.24	3.95 / 3.79	3.40 / 3.54	3.06 / 3.37	3.44 / 2.97	2.91 / 2.83	3.51 / 3.46
UniWorld-V2	4.25 / 4.22	3.57 / 3.18	3.07 / 3.23	2.87 / 3.20	3.66 / 3.79	3.14 / 2.85	3.42 / 3.41
Seedream-3.0	3.52 / 3.76	3.40 / 3.52	2.38 / 2.84	2.88 / 3.30	2.68 / 3.04	2.32 / 2.82	2.86 / 3.21
Seedream-4.0	4.41 / 4.57	4.44 / 4.64	4.00 / 3.69	4.53 / 4.62	4.05 / 4.22	4.23 / 4.31	4.28 / 4.34
PosterOmni (Ours)	4.76 / 4.72	4.69 / 4.77	3.97 / 3.81	3.98 / 4.23	4.20 / 4.35	3.99 / 4.36	4.27 / 4.37
vs. Baseline (Qwen)	+0.48 / +0.48	+0.74 / +0.98	+0.57 / +0.27	+0.92 / +0.86	+0.76 / +1.38	+1.08 / +1.53	+0.76 / +0.91

Table 1: Quantitative comparison results on PosterOmni-Bench. Gold indicates the best performance, and Blue indicates the second best.

Human Evaluation Results

Head-to-head comparison showcasing PosterOmni's Overall Preference win rates against state-of-the-art baseline models based on comprehensive human expert evaluation.

PosterOmni vs BAGEL

PosterOmni vs FLUX.1 Kontext

PosterOmni vs Qwen-Image-Edit

PosterOmni vs UniWorld-V2

PosterOmni vs Seedream-3.0

PosterOmni vs Seedream-4.0

PosterOmni Win Rate

Tie Rate

PosterOmni Lose Rate

Human experts evaluated PosterOmni against baseline models across multiple dimensions.
Results demonstrate PosterOmni's superior performance in aesthetic quality and task alignment.

Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback

Abstract

Task Demonstrations

Diverse Poster Creation Tasks

Automated Data Construction

Progressive Training Pipeline

Quantitative Comparison PosterOmni-Bench

Human Evaluation Results

Generalized Artistic Poster Creation
via Task Distillation and Unified Reward Feedback