{"ID":2837147,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.20650","arxiv_id":"2511.20650","title":"MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities","abstract":"Traditional object detection models in medical imaging operate within a closed-set paradigm, limiting their ability to detect objects of novel labels. Open-vocabulary object detection (OVOD) addresses this limitation but remains underexplored in medical imaging due to dataset scarcity and weak text-image alignment. To bridge this gap, we introduce MedROV, the first Real-time Open Vocabulary detection model for medical imaging. To enable open-vocabulary learning, we curate a large-scale dataset, Omnis, with 600K detection samples across nine imaging modalities and introduce a pseudo-labeling strategy to handle missing annotations from multi-source datasets. Additionally, we enhance generalization by incorporating knowledge from a large pre-trained foundation model. By leveraging contrastive learning and cross-modal representations, MedROV effectively detects both known and novel structures. Experimental results demonstrate that MedROV outperforms the previous state-of-the-art foundation model for medical image detection with an average absolute improvement of 40 mAP50, and surpasses closed-set detectors by more than 3 mAP50, while running at 70 FPS, setting a new benchmark in medical detection. Our source code, dataset, and trained model are available at https://github.com/toobatehreem/MedROV.","short_abstract":"Traditional object detection models in medical imaging operate within a closed-set paradigm, limiting their ability to detect objects of novel labels. Open-vocabulary object detection (OVOD) addresses this limitation but remains underexplored in medical imaging due to dataset scarcity and weak text-image alignment. To...","url_abs":"https://arxiv.org/abs/2511.20650","url_pdf":"https://arxiv.org/pdf/2511.20650v1","authors":"[\"Tooba Tehreem Sheikh\",\"Jean Lahoud\",\"Rao Muhammad Anwer\",\"Fahad Shahbaz Khan\",\"Salman Khan\",\"Hisham Cholakkal\"]","published":"2025-11-25T18:59:53Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":606660,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2837147,"paper_url":"https://arxiv.org/abs/2511.20650","paper_title":"MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities","repo_url":"https://github.com/toobatehreem/MedROV","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
