{"ID":2887476,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01152","arxiv_id":"2508.01152","title":"LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation","abstract":"We present LawDIS, a language-window-based controllable dichotomous image segmentation (DIS) framework that produces high-quality object masks. Our framework recasts DIS as an image-conditioned mask generation task within a latent diffusion model, enabling seamless integration of user controls. LawDIS is enhanced with macro-to-micro control modes. Specifically, in macro mode, we introduce a language-controlled segmentation strategy (LS) to generate an initial mask based on user-provided language prompts. In micro mode, a window-controlled refinement strategy (WR) allows flexible refinement of user-defined regions (i.e., size-adjustable windows) within the initial mask. Coordinated by a mode switcher, these modes can operate independently or jointly, making the framework well-suited for high-accuracy, personalised applications. Extensive experiments on the DIS5K benchmark reveal that our LawDIS significantly outperforms 11 cutting-edge methods across all metrics. Notably, compared to the second-best model MVANet, we achieve $F_β^ω$ gains of 4.6\\% with both the LS and WR strategies and 3.6\\% gains with only the LS strategy on DIS-TE. Codes will be made available at https://github.com/XinyuYanTJU/LawDIS.","short_abstract":"We present LawDIS, a language-window-based controllable dichotomous image segmentation (DIS) framework that produces high-quality object masks. Our framework recasts DIS as an image-conditioned mask generation task within a latent diffusion model, enabling seamless integration of user controls. LawDIS is enhanced with...","url_abs":"https://arxiv.org/abs/2508.01152","url_pdf":"https://arxiv.org/pdf/2508.01152v1","authors":"[\"Xinyu Yan\",\"Meijun Sun\",\"Ge-Peng Ji\",\"Fahad Shahbaz Khan\",\"Salman Khan\",\"Deng-Ping Fan\"]","published":"2025-08-02T02:25:51Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":611440,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2887476,"paper_url":"https://arxiv.org/abs/2508.01152","paper_title":"LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation","repo_url":"https://github.com/XinyuYanTJU/LawDIS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
