{"ID":2884400,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.06995","arxiv_id":"2508.06995","title":"S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything without Supervision","abstract":"Recent self-supervised image segmentation models have achieved promising performance on semantic segmentation and class-agnostic instance segmentation. However, their pretraining schedule is multi-stage, requiring a time-consuming pseudo-masks generation process between each training epoch. This time-consuming offline process not only makes it difficult to scale with training dataset size, but also leads to sub-optimal solutions due to its discontinuous optimization routine. To solve these, we first present a novel pseudo-mask algorithm, Fast Universal Agglomerative Pooling (UniAP). Each layer of UniAP can identify groups of similar nodes in parallel, allowing to generate both semantic-level and instance-level and multi-granular pseudo-masks within ens of milliseconds for one image. Based on the fast UniAP, we propose the Scalable Self-Supervised Universal Segmentation (S2-UniSeg), which employs a student and a momentum teacher for continuous pretraining. A novel segmentation-oriented pretext task, Query-wise Self-Distillation (QuerySD), is proposed to pretrain S2-UniSeg to learn the local-to-global correspondences. Under the same setting, S2-UniSeg outperforms the SOTA UnSAM model, achieving notable improvements of AP+6.9 on COCO, AR+11.1 on UVO, PixelAcc+4.5 on COCOStuff-27, RQ+8.0 on Cityscapes. After scaling up to a larger 2M-image subset of SA-1B, S2-UniSeg further achieves performance gains on all four benchmarks. Our code and pretrained models are available at https://github.com/bio-mlhui/S2-UniSeg","short_abstract":"Recent self-supervised image segmentation models have achieved promising performance on semantic segmentation and class-agnostic instance segmentation. However, their pretraining schedule is multi-stage, requiring a time-consuming pseudo-masks generation process between each training epoch. This time-consuming offline...","url_abs":"https://arxiv.org/abs/2508.06995","url_pdf":"https://arxiv.org/pdf/2508.06995v2","authors":"[\"Huihui Xu\",\"Jin Ye\",\"Hongqiu Wang\",\"Changkai Ji\",\"Jiashi Lin\",\"Ming Hu\",\"Ziyan Huang\",\"Ying Chen\",\"Chenglong Ma\",\"Tianbin Li\",\"Lihao Liu\",\"Junjun He\",\"Lei Zhu\"]","published":"2025-08-09T14:12:39Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":611077,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2884400,"paper_url":"https://arxiv.org/abs/2508.06995","paper_title":"S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything without Supervision","repo_url":"https://github.com/bio-mlhui/S2-UniSeg","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}