{"ID":2846947,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.00801","arxiv_id":"2511.00801","title":"Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing","abstract":"Medical image editing has emerged as a pivotal technology with broad applications in data augmentation, model interpretability, medical education, and treatment simulation. However, the lack of large-scale, high-quality, and openly accessible datasets tailored for medical contexts with strict anatomical and clinical constraints has significantly hindered progress in this domain. To bridge this gap, we introduce Med-Banana-50K, a comprehensive dataset of over 50k medically curated image edits spanning chest X-ray, brain MRI, and fundus photography across 23 diseases. Each sample supports bidirectional lesion editing (addition and removal) and is constructed using Gemini-2.5-Flash-Image based on real clinical images. A key differentiator of our dataset is the medically grounded quality control protocol: we employ an LLM-as-Judge evaluation framework with criteria such as instruction compliance, structural plausibility, image realism, and fidelity preservation, alongside iterative refinement over up to five rounds. Additionally, Med-Banana-50K includes around 37,000 failed editing attempts with full evaluation logs to support preference learning and alignment research. By offering a large-scale, medically rigorous, and fully documented resource, Med-Banana-50K establishes a critical foundation for developing and evaluating reliable medical image editing systems. Our dataset and code are publicly available. [https://github.com/richardChenzhihui/med-banana-50k].","short_abstract":"Medical image editing has emerged as a pivotal technology with broad applications in data augmentation, model interpretability, medical education, and treatment simulation. However, the lack of large-scale, high-quality, and openly accessible datasets tailored for medical contexts with strict anatomical and clinical co...","url_abs":"https://arxiv.org/abs/2511.00801","url_pdf":"https://arxiv.org/pdf/2511.00801v3","authors":"[\"Zhihui Chen\",\"Mengling Feng\"]","published":"2025-11-02T04:46:43Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.MM\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":607473,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2846947,"paper_url":"https://arxiv.org/abs/2511.00801","paper_title":"Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing","repo_url":"https://github.com/richardChenzhihui/med-banana-50k","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}