{"ID":2827221,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.17851","arxiv_id":"2512.17851","title":"InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models","abstract":"Text-to-image (T2I) diffusion models generate high-quality images but often fail to capture the spatial relations specified in text prompts. This limitation can be traced to two factors: lack of fine-grained spatial supervision in training data and inability of text embeddings to encode spatial semantics. We introduce InfSplign, a training-free inference-time method that improves spatial alignment by adjusting the noise through a compound loss in every denoising step. Proposed loss leverages different levels of cross-attention maps extracted from the backbone decoder to enforce accurate object placement and a balanced object presence during sampling. The method is lightweight, plug-and-play, and compatible with any diffusion backbone. Our comprehensive evaluations on VISOR and T2I-CompBench show that InfSplign establishes a new state-of-the-art (to the best of our knowledge), achieving substantial performance gains over the strongest existing inference-time baselines and even outperforming the fine-tuning-based methods. Codebase is available at GitHub.","short_abstract":"Text-to-image (T2I) diffusion models generate high-quality images but often fail to capture the spatial relations specified in text prompts. This limitation can be traced to two factors: lack of fine-grained spatial supervision in training data and inability of text embeddings to encode spatial semantics. We introduce...","url_abs":"https://arxiv.org/abs/2512.17851","url_pdf":"https://arxiv.org/pdf/2512.17851v2","authors":"[\"Sarah Rastegar\",\"Violeta Chatalbasheva\",\"Sieger Falkena\",\"Anuj Singh\",\"Yanbo Wang\",\"Tejas Gokhale\",\"Hamid Palangi\",\"Hadi Jamali-Rad\"]","published":"2025-12-19T17:52:43Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Diffusion Model\"]","has_code":false}
