{"ID":2826214,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19271","arxiv_id":"2512.19271","title":"3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory","abstract":"Recent image generation approaches often address subject, style, and structure-driven conditioning in isolation, leading to feature entanglement and limited task transferability. In this paper, we introduce 3SGen, a task-aware unified framework that performs all three conditioning modes within a single model. 3SGen employs an MLLM equipped with learnable semantic queries to align text-image semantics, complemented by a VAE branch that preserves fine-grained visual details. At its core, an Adaptive Task-specific Memory (ATM) module dynamically disentangles, stores, and retrieves condition-specific priors, such as identity for subjects, textures for styles, and spatial layouts for structures, via a lightweight gating mechanism along with several scalable memory items. This design mitigates inter-task interference and naturally scales to compositional inputs. In addition, we propose 3SGen-Bench, a unified image-driven generation benchmark with standardized metrics for evaluating cross-task fidelity and controllability. Extensive experiments on our proposed 3SGen-Bench and other public benchmarks demonstrate our superior performance across diverse image-driven generation tasks.","short_abstract":"Recent image generation approaches often address subject, style, and structure-driven conditioning in isolation, leading to feature entanglement and limited task transferability. In this paper, we introduce 3SGen, a task-aware unified framework that performs all three conditioning modes within a single model. 3SGen emp...","url_abs":"https://arxiv.org/abs/2512.19271","url_pdf":"https://arxiv.org/pdf/2512.19271v1","authors":"[\"Xinyang Song\",\"Libin Wang\",\"Weining Wang\",\"Zhiwei Li\",\"Jianxin Sun\",\"Dandan Zheng\",\"Jingdong Chen\",\"Qi Li\",\"Zhenan Sun\"]","published":"2025-12-22T11:07:27Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Variational Autoencoder\"]","has_code":false}