{"ID":2894597,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.11780","arxiv_id":"2507.11780","title":"Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing","abstract":"Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Many existing works circumvent non-differentiability by making the unrealistic assumption of zero probability of treatment non-response, i.e. that every unit responds (either positively or negatively) to an assigned treatment. Further, works that don't circumvent this restriction rely on refitting nuisance models a number of times proportional to the sample size. In this paper, we construct and analyze a simple, softmax smoothing-based estimator for the value of an optimal treatment policy. Our estimator applies in both static and dynamic treatment regimes, only requires fitting a constant number of nuisance models, and is statistically efficient when there is zero probability of non-response to treatment. Also, while our estimator does not require making semi-parametric restrictions, it can exploit them when they exist. We further show how our softmax smoothing approach can be used to estimate general parameters that are specified as a maximum of scores involving nuisance components, and look at conditional Balke and Pearl bounds and $L^1$ calibration error as salient examples.","short_abstract":"Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is...","url_abs":"https://arxiv.org/abs/2507.11780","url_pdf":"https://arxiv.org/pdf/2507.11780v2","authors":"[\"Justin Whitehouse\",\"Qizhao Chen\",\"Morgane Austern\",\"Vasilis Syrgkanis\"]","published":"2025-07-15T22:38:39Z","proceeding":"econ.EM","tasks":"[\"econ.EM\",\"cs.LG\",\"math.ST\",\"stat.ME\"]","methods":"[]","has_code":false}
