



PosterCraft:

Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework








Demo Video
What is PosterCraft?
From your prompts to high-quality aesthetic posters, PosterCraft excels in precise text rendering, seamless integration of abstract art, striking layouts, and stylistic harmony.
"Urban Canvas Street Art Expo poster with bold graffiti-style lettering and dynamic colorful splashes"

"This poster for the post-apocalyptic film "Echoes of the Shattered Sun" showcases a lone survivor in ragged clothes, standing on a desolate, cracked earth under a sky dominated by a fragmented, dying sun that casts long, eerie shadows. Ruined cityscapes are barely visible on the horizon. The style is bleak, atmospheric, and visually striking, emphasizing despair and a fight for survival. The film's title, "ECHOES OF THE SHATTERED SUN" is presented in a fragmented, futuristic, sans-serif font, the letters appearing as if broken and pieced together from salvaged metal, with a faint, dying orange glow. This text is positioned horizontally across the top of the poster, large and ominous. Below the survivor, the release information "THE FUTURE IS BROKEN. SURVIVAL IS ALL THAT REMAINS. COMING SOON" is in a smaller, gritty, white stencil font, horizontally centered. The fractured, thematic title amplifying the film's dystopian and survivalist themes."

Specific Datasets for PosterCraft
Dive into the diverse and specific datasets that support training workflow to achieve high-quality aesthetic poster generation.
Text-Render-2M
A comprehensive text rendering dataset containing 2 million high-quality examples. Features multi-instance text rendering, diverse text selections (varying in size, count, placement, and rotation), and dynamic content generation through both template-based and random string approaches. Essential for developing robust text rendering capabilities in poster generation.

HQ-Poster-100K
A meticulously curated collection of 100,000 high-quality posters with a comprehensive processing pipeline. Incorporates advanced filtering techniques (MD5, Hash), multi-modal scoring systems, Gemini-powered mask generation, and detailed captions. Forms the foundation for training aesthetic poster generation models.

Poster-Preference-100K
A specialized preference learning dataset with 100,000 poster images. These images, generated from user prompts, undergo a rigorous evaluation using advanced aesthetic evaluators and Gemini to form preference pairs by distinguishing between high- and low-quality examples. This process is crucial for learning nuanced aesthetic preferences and generating human-aligned posters.

Poster-Reflect-120K
This dataset is built from 120,000 posters, which are used to form into reflection pairs. Each pair is accompanied by a corresponding text reflection that analyzes poster content and aesthetic style. By aligning rich visual information with their text reflections, the dataset enables iterative vision–language feedback refinement. It allows the model to learn from both modalities to obtain more aesthetically compelling posters.

Technical Framework
A unified optimization workflow for aesthetic poster generation through four critical stages

Text Rendering Optimization
Addresses accurate text generation by precisely rendering diverse text on high-quality backgrounds, also ensuring faithful background representation and establishing foundational fidelity and robustness for poster generation.
High-quality Poster Fine-tuning
Shifts focus to overall poster style and text-background harmony using Region-aware Calibration. This fine-tuning stage preserves text accuracy while strengthening the artistic integrity of the aesthetic poster.
Aesthetic-Text RL
Employs Aesthetic-Text Preference Optimization to capture higher-order aesthetic trade-offs. This reinforcement learning stage prioritizes outputs that satisfy holistic aesthetic criteria and mitigates defects in font rendering.
Vision-Language Feedback
Introduces a Joint Vision-Language Conditioning mechanism. This iterative feedback combines visual information with targeted text suggestions for multi-modal corrections, progressively refining aesthetic content and background harmony.
Text Optimization Results
Experience the dramatic improvement in rendering accuracy and text alignment
Pre- and Post- Comparison
Witness the transformation from incorrect/missing text rendering to accurate text generation


Ocean Conservation Poster
An ocean conservation themed poster, beautiful coral reefs interspersed with plastic waste, creating a stark contrast. The warning is 'Protect Our Blue Planet, Act Now'.


Film Festival Poster
An independent film festival poster, featuring bold abstract film reel patterns with dynamic spotlight effects. The festival name 'Vanguard Visions' has a striking, unique design.


Library Archives Poster
A design containing two text elements: Gold text 'Seek Knowledge' oriented horizontally in the top center, and brown text 'Library Archives' oriented horizontally in the bottom center area.


Grey's Anatomy Poster
The title 'GREY\'S ANATOMY,' set in large, stark white, the slogan 'Life changes in a heartbeat.' is written in elegant typography with information including 'THURSDAYS 9|8c' and 'PREMIERES SEPT 23'.
Demo Showcase
Advanced poster generation capabilities showcasing diverse long text rendering




Reinforcement Learning Results
Aesthetic–Text Preference Optimization to improve poster quality through high-order aesthetics and text accuracy
Refinement with Reflection
See how PosterCraft utilizes vision-language reflection to enhance poster quality based on content and aesthetic suggestions.

Underworld
Poster Content Suggestions:
Recolor the main character's hair to a blonde tone … Adjust the main character's stance and body angle to face slightly towards the right side of the frame … Alter the way the main character holds the firearm, ensuring a firmer grip and directing the barrel towards the left side of the frame …
Aesthetic Style Optimization Suggestions:
Ensure consistent lighting and shadow placement on the character and environment based on the presumed single light source (the moon) and …


Harry Potter
Poster Content Suggestions:
Lower the position of the figure's hand holding the wand and reposition the wand to point more upwards, further away from the face …
Aesthetic Style Optimization Suggestions:
Enhance the dramatic lighting on the central figure, increasing contrast on the face to emphasize shadows and highlights … Introduce more textured cloud details in the sky, creating a moodier atmosphere. Increase the overall contrast and sharpness of the image to...

Experimental Results
Comprehensive evaluation demonstrates PosterCraft's superior performance across multiple dimensions
Model Performance Comparison
Quantitative evaluation across four critical dimensions, demonstrating PosterCraft's impressive performance across state-of-the-art poster generation models
Method | Text Recall ↑ | Text F-score ↑ | Text Accuracy ↑ |
---|---|---|---|
OpenCOLE Open | 0.082 | 0.076 | 0.061 |
Playground-v2.5 Open | 0.157 | 0.146 | 0.132 |
SD3.5 Open | 0.565 | 0.542 | 0.497 |
Flux1.dev Open | 0.723 | 0.707 | 0.667 |
Ideogram-v2 Close | 0.711 | 0.685 | 0.680 |
BAGEL Open | 0.543 | 0.536 | 0.463 |
Gemini2.0-Flash-Gen Close | 0.798 | 0.786 | 0.746 |
PosterCraft (ours) | 0.787 | 0.774 | 0.735 |
User Study Results
Human expert evaluation showcasing PosterCraft's win rate against baseline models across four critical dimensions
Note: Win rate visualization represents PosterCraft's overall performance against each baseline model in comprehensive human evaluation. Green bars show PosterCraft preference, gray bars show opponent preference. Open and Close denote open-source and closed-source models.
Platform Gallery
Discover endless creative possibilities across diverse artistic styles and themes
























