{"ID":2831447,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.07079","arxiv_id":"2512.07079","title":"Procrustean Bed for AI-Driven Retrosynthesis: A Unified Framework for Reproducible Evaluation","abstract":"Progress in computer-aided synthesis planning (CASP) is obscured by the lack of standardized evaluation infrastructure and the reliance on metrics that prioritize topological completion over chemical validity. We introduce RetroCast, a unified evaluation suite that standardizes heterogeneous model outputs into a common schema to enable statistically rigorous, apples-to-apples comparison. The framework includes a reproducible benchmarking pipeline with stratified sampling and bootstrapped confidence intervals, accompanied by SynthArena, an interactive platform for qualitative route inspection. We utilize this infrastructure to evaluate leading search-based and sequence-based algorithms on a new suite of standardized benchmarks. Our analysis reveals a divergence between \"solvability\" (stock-termination rate) and route quality; high solvability scores often mask chemical invalidity or fail to correlate with the reproduction of experimental ground truths. Furthermore, we identify a \"complexity cliff\" in which search-based methods, despite high solvability rates, exhibit a sharp performance decay in reconstructing long-range synthetic plans compared to sequence-based approaches. We release the full framework, benchmark definitions, and a standardized database of model predictions to support transparent and reproducible development in the field.","short_abstract":"Progress in computer-aided synthesis planning (CASP) is obscured by the lack of standardized evaluation infrastructure and the reliance on metrics that prioritize topological completion over chemical validity. We introduce RetroCast, a unified evaluation suite that standardizes heterogeneous model outputs into a common...","url_abs":"https://arxiv.org/abs/2512.07079","url_pdf":"https://arxiv.org/pdf/2512.07079v1","authors":"[\"Anton Morgunov\",\"Victor S. Batista\"]","published":"2025-12-08T01:26:39Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[]","has_code":false}