PosterCraft:

✨ ⭐

Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

📄 Research Paper

Sixiang Chen^1,2,*, Jianyu Lai^1,*, Jialin Gao^2,*, Tian Ye¹, Haoyu Chen¹, Hengyu Shi², Shitong Shao¹, Yunlong Lin³, Song Fei¹, Zhaohu Xing¹, Yeying Jin⁴, Junfeng Luo², Xiaoming Wei², Lei Zhu^1,5,†

¹The Hong Kong University of Science and Technology (Guangzhou), ²Meituan, ³Xiamen University, ⁴National University of Singapore, ⁵The Hong Kong University of Science and Technology

^*Equal Contribution, ^†Corresponding Author

What is PosterCraft?

🎨

From your prompts to high-quality aesthetic posters, PosterCraft excels in precise text rendering, seamless integration of abstract art, striking layouts, and stylistic harmony.

⚡ Quick Prompt

✨ Simple Description

"Urban Canvas Street Art Expo poster with bold graffiti-style lettering and dynamic colorful splashes"

PosterCraft

🎯 Detailed Prompt

🎨 Detailed Description

"This poster for the post-apocalyptic film "Echoes of the Shattered Sun" showcases a lone survivor in ragged clothes, standing on a desolate, cracked earth under a sky dominated by a fragmented, dying sun that casts long, eerie shadows. Ruined cityscapes are barely visible on the horizon. The style is bleak, atmospheric, and visually striking, emphasizing despair and a fight for survival. The film's title, "ECHOES OF THE SHATTERED SUN" is presented in a fragmented, futuristic, sans-serif font, the letters appearing as if broken and pieced together from salvaged metal, with a faint, dying orange glow. This text is positioned horizontally across the top of the poster, large and ominous. Below the survivor, the release information "THE FUTURE IS BROKEN. SURVIVAL IS ALL THAT REMAINS. COMING SOON" is in a smaller, gritty, white stencil font, horizontally centered. The fractured, thematic title amplifying the film's dystopian and survivalist themes."

PosterCraft

Specific Datasets for PosterCraft

📚

Dive into the diverse and specific datasets that support training workflow to achieve high-quality aesthetic poster generation.

Text-Render-2M

A comprehensive text rendering dataset containing 2 million high-quality examples. Features multi-instance text rendering, diverse text selections (varying in size, count, placement, and rotation), and dynamic content generation through both template-based and random string approaches. Essential for developing robust text rendering capabilities in poster generation.

HQ-Poster-100K

A meticulously curated collection of 100,000 high-quality posters with a comprehensive processing pipeline. Incorporates advanced filtering techniques (MD5, Hash), multi-modal scoring systems, Gemini-powered mask generation, and detailed captions. Forms the foundation for training aesthetic poster generation models.

Poster-Preference-100K

A specialized preference learning dataset with 100,000 poster images. These images, generated from user prompts, undergo a rigorous evaluation using advanced aesthetic evaluators and Gemini to form preference pairs by distinguishing between high- and low-quality examples. This process is crucial for learning nuanced aesthetic preferences and generating human-aligned posters.

Poster-Reflect-120K

This dataset is built from 120,000 posters, which are used to form into reflection pairs. Each pair is accompanied by a corresponding text reflection that analyzes poster content and aesthetic style. By aligning rich visual information with their text reflections, the dataset enables iterative vision–language feedback refinement. It allows the model to learn from both modalities to obtain more aesthetically compelling posters.

Technical Framework

⚡

A unified optimization workflow for aesthetic poster generation through four critical stages

📝

Text Rendering Optimization

Addresses accurate text generation by precisely rendering diverse text on high-quality backgrounds, also ensuring faithful background representation and establishing foundational fidelity and robustness for poster generation.

➤

🎨

High-quality Poster Fine-tuning

Shifts focus to overall poster style and text-background harmony using Region-aware Calibration. This fine-tuning stage preserves text accuracy while strengthening the artistic integrity of the aesthetic poster.

➤

🎯

Aesthetic-Text RL

Employs Aesthetic-Text Preference Optimization to capture higher-order aesthetic trade-offs. This reinforcement learning stage prioritizes outputs that satisfy holistic aesthetic criteria and mitigates defects in font rendering.

➤

🔄

Vision-Language Feedback

Introduces a Joint Vision-Language Conditioning mechanism. This iterative feedback combines visual information with targeted text suggestions for multi-modal corrections, progressively refining aesthetic content and background harmony.

Text Optimization Results

✨

Experience the dramatic improvement in rendering accuracy and text alignment

Pre- and Post- Comparison

Witness the transformation from incorrect/missing text rendering to accurate text generation

Before

After

Ocean Conservation Poster

An ocean conservation themed poster, beautiful coral reefs interspersed with plastic waste, creating a stark contrast. The warning is 'Protect Our Blue Planet, Act Now'.

Before

After

Film Festival Poster

An independent film festival poster, featuring bold abstract film reel patterns with dynamic spotlight effects. The festival name 'Vanguard Visions' has a striking, unique design.

Before

After

Library Archives Poster

A design containing two text elements: Gold text 'Seek Knowledge' oriented horizontally in the top center, and brown text 'Library Archives' oriented horizontally in the bottom center area.

Before

After

Grey's Anatomy Poster

The title 'GREY\'S ANATOMY,' set in large, stark white, the slogan 'Life changes in a heartbeat.' is written in elegant typography with information including 'THURSDAYS 9|8c' and 'PREMIERES SEPT 23'.

Demo Showcase

Advanced poster generation capabilities showcasing diverse long text rendering

Reinforcement Learning Results

🚀

Aesthetic–Text Preference Optimization to improve poster quality through high-order aesthetics and text accuracy

📋 Before RL

✨ After RL

Enhanced Visual Composition

Reinforcement learning significantly improves visual hierarchy and aesthetic appeal through preference-driven learning.

📋 Before RL

✨ After RL

Improved Text Accuracy

Our design enhances the precision of text rendering, minimizing errors and redundancy for clearer visual communication.

📋 Before RL

✨ After RL

Optimized Layout Structure

Reinforcement learning ensures optimal element positioning and spacing for maximum visual effectiveness.

Experimental Results

📊

Comprehensive evaluation demonstrates PosterCraft's superior performance across multiple dimensions

Model Performance Comparison

Quantitative evaluation across four critical dimensions, demonstrating PosterCraft's impressive performance across state-of-the-art poster generation models

Method	Text Recall ↑	Text F-score ↑	Text Accuracy ↑
OpenCOLE Open	0.082	0.076	0.061
Playground-v2.5 Open	0.157	0.146	0.132
SD3.5 Open	0.565	0.542	0.497
Flux1.dev Open	0.723	0.707	0.667
Ideogram-v2 Close	0.711	0.685	0.680
BAGEL Open	0.543	0.536	0.463
Gemini2.0-Flash-Gen Close	0.798	0.786	0.746
PosterCraft (ours)	0.787	0.774	0.735

User Study Results

Human expert evaluation showcasing PosterCraft's win rate against baseline models across four critical dimensions

PosterCraft vs OpenCOLE Open

Overall Performance

PosterCraft

OpenCOLE

PosterCraft vs Playground-v2.5 Open

Overall Performance

PosterCraft

Playground

PosterCraft vs SD3.5 Open

Overall Performance

PosterCraft

SD3.5

PosterCraft vs Flux1.dev Open

Overall Performance

PosterCraft

Flux1.dev

PosterCraft vs Ideogram-v2 Close

Overall Performance

PosterCraft

Ideogram

PosterCraft vs BAGEL Open

Overall Performance

PosterCraft

BAGEL

PosterCraft vs Gemini2.0-Flash-Gen Close

Overall Performance

PosterCraft

Gemini2.0

Note: Win rate visualization represents PosterCraft's overall performance against each baseline model in comprehensive human evaluation. Green bars show PosterCraft preference, gray bars show opponent preference. Open and Close denote open-source and closed-source models.