{"ID":2842968,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09554","arxiv_id":"2511.09554","title":"RF-DETR: Neural Architecture Search for Real-Time Detection Transformers","abstract":"Open-vocabulary detectors achieve impressive performance on COCO, but often fail to generalize to real-world datasets with out-of-distribution classes not typically found in their pre-training. Rather than simply fine-tuning a heavy-weight vision-language model (VLM) for new domains, we introduce RF-DETR, a light-weight specialist detection transformer that discovers accuracy-latency Pareto curves for any target dataset with weight-sharing neural architecture search (NAS). Our approach fine-tunes a pre-trained base network on a target dataset and evaluates thousands of network configurations with different accuracy-latency tradeoffs without re-training. Further, we revisit the \"tunable knobs\" for NAS to improve the transferability of DETRs to diverse target domains. Notably, RF-DETR significantly improves over prior state-of-the-art real-time methods on COCO and Roboflow100-VL. RF-DETR (nano) achieves 48.0 AP on COCO, beating D-FINE (nano) by 5.3 AP at similar latency, and RF-DETR (2x-large) outperforms GroundingDINO (tiny) by 1.2 AP on Roboflow100-VL while running 20x as fast. To the best of our knowledge, RF-DETR (2x-large) is the first real-time detector to surpass 60 AP on COCO. Our code is available at https://github.com/roboflow/rf-detr","short_abstract":"Open-vocabulary detectors achieve impressive performance on COCO, but often fail to generalize to real-world datasets with out-of-distribution classes not typically found in their pre-training. Rather than simply fine-tuning a heavy-weight vision-language model (VLM) for new domains, we introduce RF-DETR, a light-weigh...","url_abs":"https://arxiv.org/abs/2511.09554","url_pdf":"https://arxiv.org/pdf/2511.09554v2","authors":"[\"Isaac Robinson\",\"Peter Robicheaux\",\"Matvei Popov\",\"Deva Ramanan\",\"Neehar Peri\"]","published":"2025-11-12T18:58:39Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false,"code_links":[{"ID":607171,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2842968,"paper_url":"https://arxiv.org/abs/2511.09554","paper_title":"RF-DETR: Neural Architecture Search for Real-Time Detection Transformers","repo_url":"https://github.com/roboflow/rf-detr","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
