{"ID":2899482,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.02980","arxiv_id":"2507.02980","title":"Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations","abstract":"We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we offer a more realistic view of cellular responses by modeling expression distributions. Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics, such as variance, skewness, and kurtosis, at a fraction of the training cost. To generalize to unseen perturbations, we incorporate prior knowledge via gene embeddings from large language models (LLMs). While modeling a richer output space, the method remains competitive in predicting mean expression changes. This work offers a practical step towards more expressive and biologically informative models of perturbation effects.","short_abstract":"We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expres...","url_abs":"https://arxiv.org/abs/2507.02980","url_pdf":"https://arxiv.org/pdf/2507.02980v1","authors":"[\"Kalyan Ramakrishnan\",\"Jonathan G. Hedley\",\"Sisi Qu\",\"Puneet K. Dokania\",\"Philip H. S. Torr\",\"Cesar A. Prada-Medina\",\"Julien Fauqueur\",\"Kaspar Martens\"]","published":"2025-07-01T06:04:28Z","proceeding":"q-bio.GN","tasks":"[\"q-bio.GN\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}