{"ID":2867550,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17458","arxiv_id":"2509.17458","title":"CARINOX: Inference-time Scaling with Category-Aware Reward-based Initial Noise Optimization and Exploration","abstract":"Text-to-image diffusion models, such as Stable Diffusion, can produce high-quality and diverse images but often fail to achieve compositional alignment, particularly when prompts describe complex object relationships, attributes, or spatial arrangements. Recent inference-time approaches address this by optimizing or exploring the initial noise under the guidance of reward functions that score text-image alignment without requiring model fine-tuning. While promising, each strategy has intrinsic limitations when used alone: optimization can stall due to poor initialization or unfavorable search trajectories, whereas exploration may require a prohibitively large number of samples to locate a satisfactory output. Our analysis further shows that neither single reward metrics nor ad-hoc combinations reliably capture all aspects of compositionality, leading to weak or inconsistent guidance. To overcome these challenges, we present Category-Aware Reward-based Initial Noise Optimization and Exploration (CARINOX), a unified framework that combines noise optimization and exploration with a principled reward selection procedure grounded in correlation with human judgments. Evaluations on two complementary benchmarks covering diverse compositional challenges show that CARINOX raises average alignment scores by +16% on T2I-CompBench++ and +11% on the HRS benchmark, consistently outperforming state-of-the-art optimization and exploration-based methods across all major categories, while preserving image quality and diversity. The project page is available at https://amirkasaei.com/carinox/.","short_abstract":"Text-to-image diffusion models, such as Stable Diffusion, can produce high-quality and diverse images but often fail to achieve compositional alignment, particularly when prompts describe complex object relationships, attributes, or spatial arrangements. Recent inference-time approaches address this by optimizing or ex...","url_abs":"https://arxiv.org/abs/2509.17458","url_pdf":"https://arxiv.org/pdf/2509.17458v3","authors":"[\"Seyed Amir Kasaei\",\"Ali Aghayari\",\"Arash Marioriyad\",\"Niki Sepasian\",\"Shayan Baghayi Nejad\",\"MohammadAmin Fazli\",\"Mahdieh Soleymani Baghshah\",\"Mohammad Hossein Rohban\"]","published":"2025-09-22T07:51:28Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\"]","methods":"[\"Diffusion Model\",\"LoRA\"]","project_urls":"[\"https://amirkasaei.com/carinox/\"]","has_code":false,"code_links":[{"ID":609478,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867550,"paper_url":"https://arxiv.org/abs/2509.17458","paper_title":"CARINOX: Inference-time Scaling with Category-Aware Reward-based Initial Noise Optimization and Exploration","repo_url":"https://github.com/amirkasaei/carinox","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}