{"ID":2839172,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.16551","arxiv_id":"2511.16551","title":"Toward Valid Generative Clinical Trial Data with Survival Endpoints","abstract":"Clinical trials face mounting challenges: fragmented patient populations, slow enrollment, and unsustainable costs, particularly for late phase trials in oncology and rare diseases. While external control arms built from real-world data have been explored, a promising alternative is the generation of synthetic control arms using generative AI. A central challenge is the generation of time-to-event outcomes, which constitute primary endpoints in oncology and rare disease trials, but are difficult to model under censoring and small sample sizes. Existing generative approaches, largely GAN-based, are data-hungry, unstable, and rely on strong assumptions such as independent censoring. We introduce a variational autoencoder (VAE) that jointly generates mixed-type covariates and survival outcomes within a unified latent variable framework, without assuming independent censoring. Across synthetic and real trial datasets, we evaluate our model in two realistic scenarios: (i) data sharing under privacy constraints, where synthetic controls substitute for original data, and (ii) control-arm augmentation, where synthetic patients mitigate imbalances between treated and control groups. Our method outperforms GAN baselines on fidelity, utility, and privacy metrics, while revealing systematic miscalibration of type I error and power. We propose a post-generation selection procedure that improves calibration, highlighting both progress and open challenges for generative survival modeling.","short_abstract":"Clinical trials face mounting challenges: fragmented patient populations, slow enrollment, and unsustainable costs, particularly for late phase trials in oncology and rare diseases. While external control arms built from real-world data have been explored, a promising alternative is the generation of synthetic control...","url_abs":"https://arxiv.org/abs/2511.16551","url_pdf":"https://arxiv.org/pdf/2511.16551v1","authors":"[\"Perrine Chassat\",\"Van Tuan Nguyen\",\"Lucas Ducrot\",\"Emilie Lanoy\",\"Agathe Guilloux\"]","published":"2025-11-20T17:03:38Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.AP\",\"stat.ME\",\"stat.ML\"]","methods":"[\"Large Language Model\",\"Generative Adversarial Network\",\"Variational Autoencoder\"]","has_code":false}
