{"ID":2896286,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.08109","arxiv_id":"2507.08109","title":"Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing","abstract":"The advent of language models (LMs) has the potential to dramatically accelerate tasks that may be cast to text-processing; however, real-world adoption is hindered by concerns regarding safety, explainability, and bias. How can we responsibly leverage LMs in a transparent, auditable manner -- minimizing risk and allowing human experts to focus on informed decision-making rather than data-processing or prompt engineering? In this work, we propose a framework for declaring statically typed, LM-powered subroutines (i.e., callable, function-like procedures) for use within conventional asynchronous code -- such that sparse feedback from human experts is used to improve the performance of each subroutine online (i.e., during use). In our implementation, all LM-produced artifacts (i.e., prompts, inputs, outputs, and data-dependencies) are recorded and exposed to audit on demand. We package this framework as a library to support its adoption and continued development. While this framework may be applicable across several real-world decision workflows (e.g., in healthcare and legal fields), we evaluate it in the context of public comment processing as mandated by the 1969 National Environmental Protection Act (NEPA): Specifically, we use this framework to develop \"CommentNEPA,\" an application that compiles, organizes, and summarizes a corpus of public commentary submitted in response to a project requiring environmental review. We quantitatively evaluate the application by comparing its outputs (when operating without human feedback) to historical ``ground-truth'' data as labelled by human annotators during the preparation of official environmental impact statements.","short_abstract":"The advent of language models (LMs) has the potential to dramatically accelerate tasks that may be cast to text-processing; however, real-world adoption is hindered by concerns regarding safety, explainability, and bias. How can we responsibly leverage LMs in a transparent, auditable manner -- minimizing risk and allow...","url_abs":"https://arxiv.org/abs/2507.08109","url_pdf":"https://arxiv.org/pdf/2507.08109v1","authors":"[\"Reilly Raab\",\"Mike Parker\",\"Dan Nally\",\"Sadie Montgomery\",\"Anastasia Bernat\",\"Sai Munikoti\",\"Sameera Horawalavithana\"]","published":"2025-07-10T18:52:09Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}
