{"ID":2854702,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14792","arxiv_id":"2510.14792","title":"CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection","abstract":"Open-vocabulary object detection (OVD) aims to recognize and localize object categories beyond the training set. Recent approaches leverage vision-language models to generate pseudo-labels using image-text alignment, allowing detectors to generalize to unseen classes without explicit supervision. However, these methods depend heavily on single-step image-text matching, neglecting the intermediate reasoning steps crucial for interpreting semantically complex visual contexts, such as crowding or occlusion. In this paper, we introduce CoT-PL, a framework that incorporates visual chain-of-thought reasoning into the pseudo-labeling process for OVD. It decomposes complex scene understanding into three interpretable steps-object localization, category recognition, and background grounding-where these intermediate reasoning states serve as rich supervision sources. Extensive experiments on standard OVD evaluation protocols demonstrate that CoT-PL achieves state-of-the-art performance with superior pseudo-labeling efficiency, outperforming the strong baseline by 9.4 AP50 for novel classes on OV-COCO and improving box and mask APr by 3.2 and 2.2, respectively, on OV-LVIS. Code and models are available at https://github.com/hchoi256/cotpl.","short_abstract":"Open-vocabulary object detection (OVD) aims to recognize and localize object categories beyond the training set. Recent approaches leverage vision-language models to generate pseudo-labels using image-text alignment, allowing detectors to generalize to unseen classes without explicit supervision. However, these methods...","url_abs":"https://arxiv.org/abs/2510.14792","url_pdf":"https://arxiv.org/pdf/2510.14792v3","authors":"[\"Hojun Choi\",\"Youngsun Lim\",\"Jaeyo Shin\",\"Hyunjung Shim\"]","published":"2025-10-16T15:27:10Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":608184,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2854702,"paper_url":"https://arxiv.org/abs/2510.14792","paper_title":"CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection","repo_url":"https://github.com/hchoi256/cotpl","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}