{"ID":2872636,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.08640","arxiv_id":"2509.08640","title":"RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts","abstract":"Chest radiographs (CXRs) are among the most common tests in medicine. Automated image interpretation may reduce radiologists\\' workload and expand access to diagnostic expertise. Deep learning multi-task and foundation models have shown strong performance for CXR interpretation but are vulnerable to shortcut learning, where models rely on spurious and off-target correlations rather than clinically relevant features to make decisions. We introduce RoentMod, a counterfactual image editing framework that generates anatomically realistic CXRs with user-specified, synthetic pathology while preserving unrelated anatomical features of the original scan. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without requiring retraining. In reader studies with board-certified radiologists and radiology residents, RoentMod-produced images appeared realistic in 93\\% of cases, correctly incorporated the specified finding in 89-99\\% of cases, and preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19\\% AUC in internal validation and by 1-11\\% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a broadly applicable tool for probing and correcting shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a generalizable strategy for improving foundation models in medical imaging.","short_abstract":"Chest radiographs (CXRs) are among the most common tests in medicine. Automated image interpretation may reduce radiologists\\' workload and expand access to diagnostic expertise. Deep learning multi-task and foundation models have shown strong performance for CXR interpretation but are vulnerable to shortcut learning,...","url_abs":"https://arxiv.org/abs/2509.08640","url_pdf":"https://arxiv.org/pdf/2509.08640v1","authors":"[\"Lauren H. Cooke\",\"Matthias Jung\",\"Jan M. Brendel\",\"Nora M. Kerkovits\",\"Borek Foldyna\",\"Michael T. Lu\",\"Vineet K. Raghu\"]","published":"2025-09-10T14:35:24Z","proceeding":"eess.IV","tasks":"[\"eess.IV\",\"cs.AI\",\"cs.CV\"]","methods":"[]","has_code":false}