{"ID":2889324,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.22030","arxiv_id":"2507.22030","title":"ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports","abstract":"We introduce ReXGroundingCT, the first publicly available dataset linking free-text findings to pixel-level 3D segmentations in chest CT scans. The dataset includes 3,142 non-contrast chest CT scans paired with standardized radiology reports from CT-RATE. Construction followed a structured three-stage pipeline. First, GPT-4 was used to extract and standardize findings, descriptors, and metadata from reports originally written in Turkish and machine-translated into English. Second, GPT-4o-mini categorized each finding into a hierarchical ontology of lung and pleural abnormalities. Third, 3D annotations were produced for all CT volumes: the training set was quality-assured by board-certified radiologists, and the validation and test sets were fully annotated by board-certified radiologists. Additionally, a complementary chain-of-thought dataset was created to provide step-by-step hierarchical anatomical reasoning for localizing findings within the CT volume, using GPT-4o and localization coordinates derived from organ segmentation models. ReXGroundingCT contains 16,301 annotated entities across 8,028 text-to-3D-segmentation pairs, covering diverse radiological patterns from 3,142 non-contrast CT scans. About 79% of findings are focal abnormalities and 21% are non-focal. The dataset includes a public validation set of 50 cases and a private test set of 100 cases, both annotated by board-certified radiologists. The dataset establishes a foundation for enabling free-text finding segmentation and grounded radiology report generation in CT imaging. Model performance on the private test set is hosted on a public leaderboard at https://rexrank.ai/ReXGroundingCT. The dataset is available at https://huggingface.co/datasets/rajpurkarlab/ReXGroundingCT.","short_abstract":"We introduce ReXGroundingCT, the first publicly available dataset linking free-text findings to pixel-level 3D segmentations in chest CT scans. The dataset includes 3,142 non-contrast chest CT scans paired with standardized radiology reports from CT-RATE. Construction followed a structured three-stage pipeline. First,...","url_abs":"https://arxiv.org/abs/2507.22030","url_pdf":"https://arxiv.org/pdf/2507.22030v2","authors":"[\"Mohammed Baharoon\",\"Luyang Luo\",\"Michael Moritz\",\"Abhinav Kumar\",\"Sung Eun Kim\",\"Xiaoman Zhang\",\"Miao Zhu\",\"Mahmoud Hussain Alabbad\",\"Maha Sbayel Alhazmi\",\"Neel P. Mistry\",\"Lucas Bijnens\",\"Kent Ryan Kleinschmidt\",\"Brady Chrisler\",\"Sathvik Suryadevara\",\"Sri Sai Dinesh Jaliparthi\",\"Noah Michael Prudlo\",\"Mark David Marino\",\"Jeremy Palacio\",\"Rithvik Akula\",\"Di Zhou\",\"Hong-Yu Zhou\",\"Ibrahim Ethem Hamamci\",\"Scott J. Adams\",\"Hassan Rayhan AlOmaish\",\"Pranav Rajpurkar\"]","published":"2025-07-29T17:27:15Z","proceeding":"eess.IV","tasks":"[\"eess.IV\",\"cs.AI\",\"cs.CV\"]","methods":"[\"Generative Adversarial Network\"]","project_urls":"[\"https://rexrank.ai/ReXGroundingCT\"]","has_code":false,"code_links":[{"ID":611640,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2889324,"paper_url":"https://arxiv.org/abs/2507.22030","paper_title":"ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports","repo_url":"https://github.com/microsoft/BiomedParse","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
