{"ID":2868873,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16291","arxiv_id":"2509.16291","title":"Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management","abstract":"Care coordination and population health management programs serve large Medicaid and safety-net populations and must be auditable, efficient, and adaptable. While clinical risk for outreach modalities is typically low, time and opportunity costs differ substantially across text, phone, video, and in-person visits. We propose a lightweight offline reinforcement learning (RL) approach that augments trained policies with (i) test-time learning via local neighborhood calibration, and (ii) inference-time deliberation via a small Q-ensemble that incorporates predictive uncertainty and time/effort cost. The method exposes transparent dials for neighborhood size and uncertainty/cost penalties and preserves an auditable training pipeline. Evaluated on a de-identified operational dataset, TTL+ITD achieves stable value estimates with predictable efficiency trade-offs and subgroup auditing.","short_abstract":"Care coordination and population health management programs serve large Medicaid and safety-net populations and must be auditable, efficient, and adaptable. While clinical risk for outreach modalities is typically low, time and opportunity costs differ substantially across text, phone, video, and in-person visits. We p...","url_abs":"https://arxiv.org/abs/2509.16291","url_pdf":"https://arxiv.org/pdf/2509.16291v1","authors":"[\"Sanjay Basu\",\"Sadiq Y. Patel\",\"Parth Sheth\",\"Bhairavi Muralidharan\",\"Namrata Elamaran\",\"Aakriti Kinra\",\"Rajaie Batniji\"]","published":"2025-09-19T14:41:47Z","proceeding":"cs.CY","tasks":"[\"cs.CY\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}