{"ID":2833891,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.10973","arxiv_id":"2512.10973","title":"TECM*: A Data-Driven Assessment to Reinforcement Learning Methods and Application to Heparin Treatment Strategy for Surgical Sepsis","abstract":"Objective: Sepsis is a life-threatening condition caused by severe infection leading to acute organ dysfunction. This study proposes a data-driven metric and a continuous reward function to optimize personalized heparin therapy in surgical sepsis patients. Methods: Data from the MIMIC-IV v1.0 and eICU v2.0 databases were used for model development and evaluation. The training cohort consisted of abdominal surgery patients receiving unfractionated heparin (UFH) after postoperative sepsis onset. We introduce a new RL-based framework: converting the discrete SOFA score to a continuous cxSOFA for more nuanced state and reward functions; Second, defining \"good\" or \"bad\" strategies based on cxSOFA by a stepwise manner; Third, proposing a Treatment Effect Comparison Matrix (TECM), analogous to a confusion matrix for classification tasks, to evaluate the treatment strategies. We applied different RL algorithms, Q-Learning, DQN, DDQN, BCQ and CQL to optimize the treatment and comprehensively evaluated the framework. Results: Among the AI-derived strategies, the cxSOFA-CQL model achieved the best performance, reducing mortality from 1.83% to 0.74% with the average hospital stay from 11.11 to 9.42 days. TECM demonstrated consistent outcomes across models, highlighting robustness. Conclusion: The proposed RL framework enables interpretable and robust optimization of heparin therapy in surgical sepsis. Continuous cxSOFA scoring and TECM-based evaluation provide nuanced treatment assessment, showing promise for improving clinical outcomes and decision-support reliability.","short_abstract":"Objective: Sepsis is a life-threatening condition caused by severe infection leading to acute organ dysfunction. This study proposes a data-driven metric and a continuous reward function to optimize personalized heparin therapy in surgical sepsis patients. Methods: Data from the MIMIC-IV v1.0 and eICU v2.0 databases we...","url_abs":"https://arxiv.org/abs/2512.10973","url_pdf":"https://arxiv.org/pdf/2512.10973v1","authors":"[\"Jiang Liu\",\"Yujie Li\",\"Chan Zhou\",\"Yihao Xie\",\"Qilong Sun\",\"Xin Shu\",\"Peiwei Li\",\"Chunyong Yang\",\"Yiziting Zhu\",\"Jiaqi Zhu\",\"Yuwen Chen\",\"Bo An\",\"Hao Wu\",\"Bin Yi\"]","published":"2025-12-02T08:38:07Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Generative Adversarial Network\"]","has_code":false}