{"ID":2862304,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.01399","arxiv_id":"2510.01399","title":"Resolving the Identity Crisis in Text-to-Image Generation","abstract":"State-of-the-art text-to-image models suffer from a persistent identity crisis when generating scenes with multiple humans: producing duplicate faces, merging identities, and miscounting individuals. We present DisCo (Reinforcement with Diversity Constraints), a reinforcement learning framework that directly optimizes identity diversity both within images and across groups of generated samples. DisCo fine-tunes flow-matching models using Group-Relative Policy Optimization (GRPO), guided by a compositional reward that: (i) penalizes facial similarity within images, (ii) discourages identity repetition across samples, (iii) enforces accurate person counts, and (iv) preserves visual fidelity and prompt alignment via human preference scores. A single-stage curriculum stabilizes training as prompt complexity increases. Importantly, this method does not require any real data. On the DiverseHumans Testset, DisCo achieves 98.6% Unique Face Accuracy and near-perfect Global Identity Spread, outperforming open-source and proprietary models (e.g., Gemini, GPT-Image) while maintaining perceptual quality. Our results establish cross-sample diversity as a critical axis for resolving identity collapse, positioning DisCo as a scalable, annotation-free solution for multi-human image synthesis. Project page: https://qualcomm-ai-research.github.io/disco/","short_abstract":"State-of-the-art text-to-image models suffer from a persistent identity crisis when generating scenes with multiple humans: producing duplicate faces, merging identities, and miscounting individuals. We present DisCo (Reinforcement with Diversity Constraints), a reinforcement learning framework that directly optimizes...","url_abs":"https://arxiv.org/abs/2510.01399","url_pdf":"https://arxiv.org/pdf/2510.01399v3","authors":"[\"Shubhankar Borse\",\"Farzad Farhadzadeh\",\"Munawar Hayat\",\"Fatih Porikli\"]","published":"2025-10-01T19:28:51Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
