{"ID":2872278,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.09602","arxiv_id":"2509.09602","title":"LAVA: Language Model Assisted Verbal Autopsy for Cause-of-Death Determination","abstract":"Verbal autopsy (VA) is a critical tool for estimating causes of death in resource-limited settings where medical certification is unavailable. This study presents LA-VA, a proof-of-concept pipeline that combines Large Language Models (LLMs) with traditional algorithmic approaches and embedding-based classification for improved cause-of-death prediction. Using the Population Health Metrics Research Consortium (PHMRC) dataset across three age categories (Adult: 7,580; Child: 1,960; Neonate: 2,438), we evaluate multiple approaches: GPT-5 predictions, LCVA baseline, text embeddings, and meta-learner ensembles. Our results demonstrate that GPT-5 achieves the highest individual performance with average test site accuracies of 48.6% (Adult), 50.5% (Child), and 53.5% (Neonate), outperforming traditional statistical machine learning baselines by 5-10%. Our findings suggest that simple off-the-shelf LLM-assisted approaches could substantially improve verbal autopsy accuracy, with important implications for global health surveillance in low-resource settings.","short_abstract":"Verbal autopsy (VA) is a critical tool for estimating causes of death in resource-limited settings where medical certification is unavailable. This study presents LA-VA, a proof-of-concept pipeline that combines Large Language Models (LLMs) with traditional algorithmic approaches and embedding-based classification for...","url_abs":"https://arxiv.org/abs/2509.09602","url_pdf":"https://arxiv.org/pdf/2509.09602v1","authors":"[\"Yiqun T. Chen\",\"Tyler H. McCormick\",\"Li Liu\",\"Abhirup Datta\"]","published":"2025-09-11T16:42:22Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"stat.AP\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}