{"ID":2824578,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.22801","arxiv_id":"2512.22801","title":"Evaluating the Performance of Open-Vocabulary Object Detection in Low-quality Image","abstract":"Open-vocabulary object detection enables models to localize and recognize objects beyond a predefined set of categories and is expected to achieve recognition capabilities comparable to human performance. In this study, we aim to evaluate the performance of existing models on open-vocabulary object detection tasks under low-quality image conditions. For this purpose, we introduce a new dataset that simulates low-quality images in the real world. In our evaluation experiment, we find that although open-vocabulary object detection models exhibited no significant decrease in mAP scores under low-level image degradation, the performance of all models dropped sharply under high-level image degradation. OWLv2 models consistently performed better across different types of degradation, while OWL-ViT, GroundingDINO, and Detic showed significant performance declines. We will release our dataset and codes to facilitate future studies.","short_abstract":"Open-vocabulary object detection enables models to localize and recognize objects beyond a predefined set of categories and is expected to achieve recognition capabilities comparable to human performance. In this study, we aim to evaluate the performance of existing models on open-vocabulary object detection tasks unde...","url_abs":"https://arxiv.org/abs/2512.22801","url_pdf":"https://arxiv.org/pdf/2512.22801v2","authors":"[\"Po-Chih Wu\"]","published":"2025-12-28T06:18:22Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}