Despite remarkable progress in Text-to-Image models, many real-world applications require generating coherent image sets with diverse consistency requirements. Existing consistent methods often focus on a specific domain with specific aspects of consistency, which significantly constrains their generalizability to broader applications. In this paper, we propose a more challenging problem, Text-to-ImageSet (T2IS) generation, which aims to generate sets of images that meet various consistency requirements based on user instructions. To systematically study this problem, we first introduce T2IS-Bench with 596 diverse instructions across 26 subcategories, providing comprehensive coverage for T2IS generation. Building on this, we propose T2IS-Eval, an evaluation framework that transforms user instructions into multifaceted assessment criteria and employs effective evaluators to adaptively assess consistency fulfillment between criteria and generated sets. Subsequently, we propose AutoT2IS, a training-free framework that maximally leverages pretrained Diffusion Transformers' in-context capabilities to harmonize visual elements to satisfy both image-level prompt alignment and set-level visual consistency. Extensive experiments on T2IS-Bench reveal that diverse consistency challenges all existing methods, while our AutoT2IS significantly outperforms current generalized and even specialized approaches. Our method also demonstrates the ability to enable numerous underexplored real-world applications, confirming its substantial practical value. All our data and code will be publicly available.
We introduce T2IS-Bench, a benchmark designed to comprehensively assess text-to-image synthesis models by reflecting real-world consistency requirements. The accompanying pie chart illustrates the balanced distribution of user instructions across 26 distinct subcategories, highlighting the benchmark's broad coverage and its utility in advancing evaluation practices for generative AI systems.
Model | Aesthetics | Prompt Alignment | Visual Consistency | Avg. | ||||
---|---|---|---|---|---|---|---|---|
Entity | Attribute | Relation | Identity | Style | Logic |
Generate multiple 3D animated images of a happy hedgehog. The images should include settings such as being in a cozy nest, dressed in a miniature jacket, wearing a small collar, dressed in a festive outfit, and wearing a flower crown.
Generate multiple watercolor illustrations of a playful puppy. The images should include settings such as playing in the yard, swimming, wearing a training harness, chasing a ball, and wearing a small sweater.
This is a children's picture book illustration generation task consisting of 5 pages, titled "The Lion and the Clever Mouse." Scene and character IDs need to remain consistent throughout the book to ensure stylistic uniformity and…
Please generate a brave knight character in a realistic style. He is wearing shiny silver armor, with a determined face, holding a greatsword. The first image shows him standing with both hands holding the sword across his chest...
Please generate a martial artist character in a traditional ink sketch style. He wears a sleeveless gi with a black belt, his muscular frame highlighting years of training. The first image shows him standing on one foot, arms...
Please generate a masked ninja character in a monochrome ink brush style. He wears a traditional shinobi outfit with a katana strap. The first image shows him standing on one foot, arms extended in a balanced pose; the second image...
Generate product mockups featuring a vintage stamp-inspired logo that embodies old-world travel and exploration, with subtle distressed textures and classic serif typography. Apply the logo on 4 travel-related items: a leather luggage tag...
Create product mockups using an organic, hand-drawn logo design that combines minimalist botanical line art with a clean modern typeface, evoking natural purity. Feature this logo on 4 eco-friendly products: a reusable water bottle...
Generate product mockups featuring a handcrafted, rustic logo inspired by traditional woodcarving techniques. The logo should showcase intricate wood grain textures paired with vintage typography. Apply this logo on 4 artisanal...
What are 5 steps for cooking pork shoulder steaks on a Weber Kettle Grill, including an image and brief description for each step?
Please provide 5 steps for making Sweet Boondhi, along with an image and a brief description for each step.
Please provide a detailed guide on how to air fry bacon, including 5 steps. For each step, generate an image and include a brief description.
Produce 4 images for a romantic movie. [SCENE-1] Young couple awkwardly bump into each other at a bookstore, books tumble to floor as they exchange glances. Use soft focus and warm tones to create a dreamy effect. [SCENE-2]...
Produce 4 images for a musical movie. [SCENE-1] Street performer begins singing, camera captures passionate expression as crowd gathers. Use a medium shot to capture his performance. [SCENE-2] Pull back to show growing...
Design 4 images for a superhero origin movie. [SCENE-1] Scientist works late in lab with experimental equipment and glowing screens. Use cool lighting to create a technological feel. [SCENE-2] Accident occurs with bright flashes and...
Generate a detailed 4-stage image series illustrating the construction process of a modern skyscraper. Stage 1: Foundation excavation and groundwork preparation. Stage 2: Erection of the steel framework. Stage 3: Installation...
Generate a detailed 4-stage image series illustrating the construction process of a community center. Stage 1: Community planning and site clearing. Stage 2: Laying the foundation and constructing the basic structural framework...
Create a detailed 4-stage image series illustrating the construction process of a high-tech corporate headquarters. Stage 1: Site clearing and blueprint planning. Stage 2: Foundation work and structural framework erection. Stage 3...
Please generate a scene of a flower blooming from bud to full blossom, containing 4 images arranged in chronological order, showing the flower's evolution from a closed bud to a fully opened blossom. All images must follow the natural...
Please generate a set of images depicting the growth of a sunflower from sprouting to full bloom. The first image shows a tiny sprout emerging from moist soil in the morning dew; the second image shows the young seedling with a few...
Please generate a set of images showing the transformation of a shapeshifting blob creature from a formless mass to a defined entity. The first image features a small, amorphous puddle of liquid-like material pulsating on the ground...
How is the volcanic eruption scientifically explained? Provide a detailed explanation of the principles behind the phenomenon, including relevant knowledge.
What is the detailed process behind the formation of honey, including the principles and relevant knowledge? Please provide 4 images to describe it.
Can you provide a comprehensive scientific explanation of the rainbow phenomenon, detailing the underlying principles and relevant knowledge?
Minimalist line style, using soft and smooth black lines to outline the subject. Each painting consists of only a few strokes with large areas of blank space, emphasizing simplicity and elegance. Generate 4 images with the subjects being a...
Please generate a set of 4 oil painting-style images depicting a bustling harbor. The first image shows the pier at dawn, with fishing boats setting out to sea and a light mist hanging over the water; the second image is set at noon, with...
Brush stroke style, imitating traditional Chinese ink painting, using black ink lines of varying thickness with slight smudging effects. Generate 4 images with the subjects being bamboo, plum blossom, pine tree, and lotus. Ensure the...
Generate a series of 4 classic comic book posters that share a unified, dynamic design style while spotlighting different iconic scenes. Unified Elements: Art Style & Colors: Bold, graphic design with primary colors (red, blue, yellow...
Generate a series of 4 abstract expressionism posters that share a unified, expressive style while exploring different emotional and visual themes. Unified Elements: Art Style & Colors: Bold, free-flowing brushstrokes with a vibrant...
Generate a series of 4 Botanical Wonderland posters that share a unified nature-inspired style while celebrating the beauty of flora. Use the following guidelines:Unified Elements: Art Style & Colors: Delicate, watercolor-inspired...
Please generate a set of images depicting major historical events of Ancient Greece in the 5th century BCE. The first image shows the Battle of Marathon during the Greco-Persian Wars, where Greek soldiers are fiercely resisting the...
Please generate a set of images depicting the expansion of the Mongol Empire from the 13th to 14th centuries. The first image shows Mongol horsemen galloping across the steppe, preparing to attack a neighboring tribe; the second image...
Please generate a set of images illustrating the Age of Exploration (15th-16th century). The first image shows Christopher Columbus's fleet encountering Caribbean islands with indigenous Taíno people observing from shore; the second...
Please generate a set of images depicting the growth of a real-world oak tree from seed to maturity. The first image shows an acorn buried in soft soil, surrounded by a few fallen leaves; the second image shows a young oak seedling just...
Please generate a set of images depicting the growth of a sunflower from sprouting to full bloom. The first image shows a tiny sprout emerging from moist soil in the morning dew; the second image shows the young seedling with a few leaves...
Please generate a set of images depicting the growth of a carnivorous Venus flytrap from seed to full maturity. The first image shows a tiny seed in wet soil inside a glass terrarium. The second image features a small sprout with its first...
The final artwork is a simple and charming line drawing of a snowman holding flowers. The steps to create it are: Sketch the Head and Hat – Draw a round head with a small face, adding a tilted hat for character. Outline the Body and Arms...
This artwork showcases the step-by-step creation of an adorable sheep figurine. The process unfolds as follows: Form the Base Shape – Start by molding a smooth, rounded base as the foundation. Construct the Fluffy Wool – Arrange small...
This illustration showcases the gradual creation of a cheerful character wearing glasses. The process unfolds as follows: Sketch the Basic Head Shape – Start with a circle and add facial guidelines to position the features.Define the Face...
Please provide 4 steps for chopping an onion, including a visual representation and brief description for each step.
Please provide 4 steps for making refrigerator dill pickles, including an image and a brief description for each step.
Could you provide a detailed guide on making cherry jam, including 4 key steps? For each step, please include an illustrative image and a concise description.
Please generate a scene of an apple falling from a tree, containing 4 images arranged in chronological order, showing the process of the apple detaching from the tree and contacting the ground. All images must follow the physical law of...
Please generate a scene of a ball rolling off a table, containing 4 images arranged in chronological order, showing the ball rolling from the edge of the table and landing on the ground. All images must follow the physical laws of inertia...
Generate four images illustrating the motion of a rocket launch. The first image shows the rocket on the launch pad, engines igniting. The second image depicts the rocket accelerating upwards, with exhaust gases pushing against the...
Generate product mockups featuring our new logo—a dynamic, minimalist swoosh reminiscent of Nike's iconic design that suggests speed and motion. Apply this logo across 4 products: Athletic running shoes, Performance sports t-shirt...
Generate product mockups featuring a futuristic, tech-inspired logo that integrates a subtle circuit board motif with a digital typeface. Use a monochromatic palette and apply the logo on 4 tech products: a VR headset, wireless...
Generate product mockups featuring a vintage stamp-inspired logo that embodies old-world travel and exploration, with subtle distressed textures and classic serif typography. Apply the logo on 4 travel-related items: a leather luggage...
BibTex Code Here