{"ID":2878080,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.18966","arxiv_id":"2508.18966","title":"USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning","abstract":"Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter insists on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be unified under a single framework because they ultimately concern the disentanglement and re-composition of content and style, a long-standing theme in style-driven research. To this end, we present USO, a Unified Style-Subject Optimized customization model. First, we construct a large-scale triplet dataset consisting of content images, style images, and their corresponding stylized content images. Second, we introduce a disentangled learning scheme that simultaneously aligns style features and disentangles content from style through two complementary objectives, style-alignment training and content-style disentanglement training. Third, we incorporate a style reward-learning paradigm denoted as SRL to further enhance the model's performance. Finally, we release USO-Bench, the first benchmark that jointly evaluates style similarity and subject fidelity across multiple metrics. Extensive experiments demonstrate that USO achieves state-of-the-art performance among open-source models along both dimensions of subject consistency and style similarity. Code and model: https://github.com/bytedance/USO","short_abstract":"Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter insists on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be unified under a single framework because they u...","url_abs":"https://arxiv.org/abs/2508.18966","url_pdf":"https://arxiv.org/pdf/2508.18966v1","authors":"[\"Shaojin Wu\",\"Mengqi Huang\",\"Yufeng Cheng\",\"Wenxu Wu\",\"Jiahe Tian\",\"Yiming Luo\",\"Fei Ding\",\"Qian He\"]","published":"2025-08-26T12:10:24Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":610447,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2878080,"paper_url":"https://arxiv.org/abs/2508.18966","paper_title":"USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning","repo_url":"https://github.com/bytedance/USO","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
