{"ID":2846082,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.05565","arxiv_id":"2511.05565","title":"In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy","abstract":"Foundation vision-language models (VLMs) excel on natural images, but their utility for biomedical microscopy remains underexplored. In this paper, we investigate how in-context learning enables state-of-the-art VLMs to perform few-shot object detection when large annotated datasets are unavailable, as is often the case with microscopic images. We introduce the Micro-OD benchmark, a curated collection of 252 images specifically curated for in-context learning, with bounding-box annotations spanning 11 cell types across four sources, including two in-lab expert-annotated sets. We systematically evaluate eight VLMs under few-shot conditions and compare variants with and without implicit test-time reasoning tokens. We further implement a hybrid Few-Shot Object Detection (FSOD) pipeline that combines a detection head with a VLM-based few-shot classifier, which enhances the few-shot performance of recent VLMs on our benchmark. Across datasets, we observe that zero-shot performance is weak due to the domain gap; however, few-shot support consistently improves detection, with marginal gains achieved after six shots. We observe that models with reasoning tokens are more effective for end-to-end localization, whereas simpler variants are more suitable for classifying pre-localized crops. Our results highlight in-context adaptation as a practical path for microscopy, and our benchmark provides a reproducible testbed for advancing open-vocabulary detection in biomedical imaging.","short_abstract":"Foundation vision-language models (VLMs) excel on natural images, but their utility for biomedical microscopy remains underexplored. In this paper, we investigate how in-context learning enables state-of-the-art VLMs to perform few-shot object detection when large annotated datasets are unavailable, as is often the cas...","url_abs":"https://arxiv.org/abs/2511.05565","url_pdf":"https://arxiv.org/pdf/2511.05565v1","authors":"[\"Shreyan Ganguly\",\"Angona Biswas\",\"Jaydeep Rade\",\"Md Hasibul Hasan Hasib\",\"Nabila Masud\",\"Nitish Singla\",\"Abhipsa Dash\",\"Ushashi Bhattacharjee\",\"Aditya Balu\",\"Anwesha Sarkar\",\"Adarsh Krishnamurthy\",\"Soumik Sarkar\"]","published":"2025-11-04T06:06:02Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}